which loop is faster

sushant · Jan 3, 2005

hi all,

suppose i have 2 loops one inside the other, like this

1) for(i=0;i<=100;i++)
{
for(j=0;j<=10;j++)
{
some code;
}
}

2) for(i=0;i<=10;i++)
{
for(j=0;j<=100;j++)
{
some code;
}
}

so which loops will work faster the 1st one or the 2nd one.

thnx in advance
sushant

Shan · Jan 3, 2005

Please dont ask interview questions in CLC....

Jens.Toerring · Jan 3, 2005

sushant said:
suppose i have 2 loops one inside the other, like this

1) for(i=0;i<=100;i++)
{
for(j=0;j<=10;j++)
{
some code;
}
}

2) for(i=0;i<=10;i++)
{
for(j=0;j<=100;j++)
{
some code;
}
}

so which loops will work faster the 1st one or the 2nd one.

Nobody can tell but you after you made careful measurements. There's
nothing in the sppecifications of the C language that would require
one of the loops to be faster than the other. If one of them should
be faster than it's due to the way the implementation, i.e. the C com-
piler, is written and may also depend on what you're doing in the loop.
You can only find out by measuring what happens. Unfortunately, that
result will only be valid for the compiler/system combination you did
the tests with.
Regards, Jens

EventHelix.com · Jan 3, 2005

It will really depend upon "some code".
If "some code" just fits into the cache but the outer loop doesn't:
(2) would be faster.

If the inner and the outer loop fit into the cache, I don't expect to
see
a large difference in performance.

Deepa

Neo · Jan 3, 2005

EventHelix.com said:
It will really depend upon "some code".
If "some code" just fits into the cache but the outer loop doesn't:
(2) would be faster.

Whtz da meaning of outer loop doesn't, here?
-Neo

Neo · Jan 3, 2005

sushant said:
hi all,

suppose i have 2 loops one inside the other, like this

1) for(i=0;i<=100;i++)
{
for(j=0;j<=10;j++)
{
some code;
}
}

2) for(i=0;i<=10;i++)
{
for(j=0;j<=100;j++)
{
some code;
}
}

so which loops will work faster the 1st one or the 2nd one.

thnx in advance
sushant

Implementation Defined!
How your target compiler optimize the code... and the target platform
you are compiling your code for...

-Neo

infobahn · Jan 3, 2005

Neo said:
Implementation Defined!

Not in the C sense of that term. Compilers are not required to
document their choice in this case.

xarax · Jan 3, 2005

Neo said:
Whtz da meaning of outer loop doesn't, here?
-Neo

Seems obvious from the next piece:

If the outer loop doesn't fit within the cache
line and the inner does fit, then there is a
performance hit each time the inner loop exits.

If both inner and outer loops fit entirely
within a cache line and the compiler has properly
placed the code within the cache line, then there
should be no significant difference between the
two examples.

--
----------------------------
Jeffrey D. Smith
Farsight Systems Corporation
24 BURLINGTON DRIVE
LONGMONT, CO 80501-6906
http://www.farsight-systems.com
z/Debug debugs your Systems/C programs running on IBM z/OS for FREE!

Lew Pitcher · Jan 3, 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hi all,

suppose i have 2 loops one inside the other, like this

1) for(i=0;i<=100;i++)
{
for(j=0;j<=10;j++)
{
some code;
}
}

2) for(i=0;i<=10;i++)
{
for(j=0;j<=100;j++)
{
some code;
}
}

so which loops will work faster the 1st one or the 2nd one.

Sorry, but as others have pointed out, this question cannot be answered
properly within the confines of standard C. The C language makes no
distinctions between these two examples, and the only operational distinction
is made by your specific compiler. In some cases, both examples will have
equal execution time, while in other cases, one example will have different
time than the other.

However, at the risk of being accused of a 'premature optimization', I'd guess
that the average 'dumb' compiler implementation would generate 'faster' code
for the second example than the first.

In the first example, the outer loop is initialized once, but the inner loop
is initialized 100 times.

In the second example, the outer loop is still initialize once, but the inner
loop is initialized 10 times.

Assuming a discernable execution penalty is imposed for loop initialization,
the first example will incur 10 times more penalty than the second example would.

- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFB2WoSagVFX4UWr64RAsvlAJ9rWtroUaYuINsUd5hQuzPUorRAzwCfcFp+
G48dsa76llsXr09cKpD8SkA=
=5T2N
-----END PGP SIGNATURE-----

KiLVaiDeN · Jan 3, 2005

I would say that generally it's better to make the loops with the more
occurences the more inside.

Only looking at memory talking about indexes :

[loop 1000]
[loop 10]
This configuration means doing 10 times memory access for the inner loop,
then 1 for outter loop, and so on :
10
1
10
1
....1000 times

[loop 10]
[loop 1000]
This one, means doing 1000 memory access to first index, then 1 to outter

1000
1
1000
1
....10 times

Looking this, we can see that on the second case, there is fewer memory
access to different positions.
On the first case, we go from one point of memory to the other all the time.
Meaning that if the memory is swapped for some reason, then first one will
be slower.

My 2 cents, tell me if I'm wrong

K

Jonathan Bartlett · Jan 3, 2005

Looking this, we can see that on the second case, there is fewer memory
access to different positions.
On the first case, we go from one point of memory to the other all the time.
Meaning that if the memory is swapped for some reason, then first one will
be slower.

Of course, the looped could just be unrolled by the compiler...

Jon

Christian Bau · Jan 3, 2005

hi all,

suppose i have 2 loops one inside the other, like this

1) for(i=0;i<=100;i++)
{
for(j=0;j<=10;j++)
{
some code;
}
}

2) for(i=0;i<=10;i++)
{
for(j=0;j<=100;j++)
{
some code;
}
}

so which loops will work faster the 1st one or the 2nd one.

some code;

is not legal C. Replace it with some legal C code, then the answer will
depend on the C code and in the implementation.

To get a correct answer, you'll have to measure it.

E. Robert Tisdale · Jan 4, 2005

Shan said:
Please don't ask interview questions in CLC.

Please don't answer interview questions in CLC.

Jack Klein · Jan 4, 2005

On 3 Jan 2005 04:11:39 -0800, "EventHelix.com" <[email protected]>
wrote in comp.lang.c:

Either figure out how to do one of the following:

1. Get the new Google groups reply to quote context.

2. Quote the context yourself.

3. If you can't do either of the above, use a real newsreader. Free
Agent, Mozilla Thunderbird, and Gravity are all free and quite good.

It will really depend upon "some code".
If "some code" just fits into the cache but the outer loop doesn't:
(2) would be faster.

If the inner and the outer loop fit into the cache, I don't expect to
see
a large difference in performance.

Deepa

What's a 'cache'? Where is it defined in the C standard? I write C
code for quite a few platforms that don't have any such thing.

So what's your answer if the OP is writing code for a platform without
a cache?

Stan Milam · Jan 4, 2005

KiLVaiDeN said:
I would say that generally it's better to make the loops with the more
occurences the more inside.

Only looking at memory talking about indexes :

[loop 1000]
[loop 10]
This configuration means doing 10 times memory access for the inner loop,
then 1 for outter loop, and so on :
10
1
10
1
...1000 times

[loop 10]
[loop 1000]
This one, means doing 1000 memory access to first index, then 1 to outter

1000
1
1000
1
...10 times

Looking this, we can see that on the second case, there is fewer memory
access to different positions.
On the first case, we go from one point of memory to the other all the time.
Meaning that if the memory is swapped for some reason, then first one will
be slower.

My 2 cents, tell me if I'm wrong

K

I do not know if you are wrong or not. What keeps popping into my head
is the commutive property of mathematics: (10 * 100) == (100 * 10). But
who can really say? After all, there are a plethora of C compilers out
there and they will all emit binary code differently, for whatever
platform they are targeted for.

Neo · Jan 4, 2005

Stan Milam said:
KiLVaiDeN said:

I would say that generally it's better to make the loops with the more
occurences the more inside.

Only looking at memory talking about indexes :

[loop 1000]
[loop 10]
This configuration means doing 10 times memory access for the inner loop,
then 1 for outter loop, and so on :
10
1
10
1
...1000 times

[loop 10]
[loop 1000]
This one, means doing 1000 memory access to first index, then 1 to outter

1000
1
1000
1
...10 times

Looking this, we can see that on the second case, there is fewer memory
access to different positions.
On the first case, we go from one point of memory to the other all the
time.
Meaning that if the memory is swapped for some reason, then first one
will
be slower.

My 2 cents, tell me if I'm wrong

K

Click to expand...

I do not know if you are wrong or not. What keeps popping into my head is
the commutive property of mathematics: (10 * 100) == (100 * 10). But who
can really say? After all, there are a plethora of C compilers out there
and they will all emit binary code differently, for whatever platform they
are targeted for.

Yes, mathematically dis is true.

When it comes to relams of C compilers its totally up to the compiler how it
generates the machine instructions to do the same. and dat also depends on,
for which target machine you are going to generate dat code. Its properties
affect the binary code generation like : word length, chache - if any the
processor has, address n data buses (von newman vs. harvard architecture) n
of couse, how compiler optimize the code while exploiting all these
available facilites.

One important thing - Standard doesn't make any assumptions about it.

It is obvious by closely analysing the two loops that memory access will be
more in first case
[100]
....[10]

as compared to
[10]
....[100]

'coz inner loop breaks 10 times more and intialization has to be performed
for loop control variable 10 times more, assuming, all other things being
constant.

Is that make sense?

-Neo

Nick Keighley · Jan 4, 2005

Neo wrote:

Yes, mathematically dis is true.

When it comes to relams of C compilers its totally up to the compiler how it
generates the machine instructions to do the same. and dat also depends on,
for which target machine you are going to generate dat code. Its properties
affect the binary code generation like : word length, chache - if any the
processor has, address n data buses (von newman vs. harvard architecture) n
of couse, how compiler optimize the code while exploiting all these
available facilites.

One important thing - Standard doesn't make any assumptions about it.

It is obvious by closely analysing the two loops that memory access will be
more in first case
[100]
...[10]

as compared to
[10]
...[100]

'coz inner loop breaks 10 times more and intialization has to be performed
for loop control variable 10 times more, assuming, all other things being
constant.

Is that make sense?

-Neo

I normally avoid spelling flames but...

dis -> this
its -> it's (in at least one case)
dat -> that
coz -> because
I've skipped a couple of genuine typos

petermcmillan_uk · Jan 4, 2005

Neo said:
Yes, mathematically dis is true.

When it comes to relams of C compilers its totally up to the compiler how it
generates the machine instructions to do the same. and dat also depends on,
for which target machine you are going to generate dat code. Its properties
affect the binary code generation like : word length, chache - if any the
processor has, address n data buses (von newman vs. harvard architecture) n
of couse, how compiler optimize the code while exploiting all these
available facilites.

One important thing - Standard doesn't make any assumptions about it.

It is obvious by closely analysing the two loops that memory access will be
more in first case
[100]
...[10]

as compared to
[10]
...[100]

'coz inner loop breaks 10 times more and intialization has to be performed
for loop control variable 10 times more, assuming, all other things being
constant.

Is that make sense?

-Neo

OK, I just had to find out the truth, so I compiled the program and
looked at the disassemly. The results seem to be pretty much the same.
Take a look :

# For loop 1
00411A8F mov dword ptr ,0
00411A96 jmp main+71h (411AA1h)
00411A98 mov eax,dword ptr
00411A9B add eax,1
00411A9E mov dword ptr ,eax
00411AA1 cmp dword ptr ,0Ah
00411AA5 jg main+0A0h (411AD0h)

# for loop 2
00411AA7 mov dword ptr [j],0
00411AAE jmp main+89h (411AB9h)
00411AB0 mov eax,dword ptr [j]
00411AB3 add eax,1
00411AB6 mov dword ptr [j],eax
00411AB9 cmp dword ptr [j],64h
00411ABD jg main+9Eh (411ACEh)

# the code (printf("") in this case)
00411ABF push offset string "" (4240C8h)
00411AC4 call @ILT+1170(_printf) (411497h)
00411AC9 add esp,4

# end of loop 2
00411ACC jmp main+80h (411AB0h)

# end of loop 1
00411ACE jmp main+68h (411A98h)

That's one way, and if it's done the other way then the only difference
will be the parameters used for the for looping. The thing that makes
it loop is jmp, and in both cases it will jmp the same number of times.
The other parts of the for loops have exactly the same instructions,
but with different parameters.
I think we can conclude that neither of these loops will be faster.

Lawrence Kirby · Jan 4, 2005

hi all,

suppose i have 2 loops one inside the other, like this

1) for(i=0;i<=100;i++)

This is suspicious because it will loop 101 times,

{
for(j=0;j<=10;j++)

and this will loop 11 times. If you expected 100 and 10 then make the
comparison a < rather than a <= . That's common and normal in C as we tend
to count from 0 a lot especially when array indexing is concerned. A 100
element array has elements indexed from 0 to 99 and trying to access an
element at position 100 would be an error.

{
some code;
}
}
}
2) for(i=0;i<=10;i++)
{
for(j=0;j<=100;j++)
{
some code;
}
}
}
so which loops will work faster the 1st one or the 2nd one.

It is impossible to say. It will depend on your compiler (and the options
you use with it), the processor used to run the code and not least on what
exactly "some code" is. If "some code" uses i and/or j then the two
samples are not equivalent and it doesn't make much sense to talk about
which is faster. That's also the case if i and/or j are used after the
loops.

Lawrence

Grumble · Jan 4, 2005

Jack said:
What's a 'cache'? Where is it defined in the C standard?

Isn't it a place where you can hide your keywords?

Difference between using "let" in a "for" loop	0	Jul 3, 2022
Index Error during backpropagation in a multilayer neural network.	1	Jun 17, 2023
Python Gurobi Optimizing Cost has no errors but I get no sensible solution	0	Aug 30, 2022
which one runs faster ??	17	May 30, 2007
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
While Loop Freezing?	1	Feb 20, 2021
Quick sort algorithm	1	Feb 22, 2023
Addition and substraction of polynomials is working fine but the multiplication isn't; what's wrong with my code	1	Nov 22, 2022

which loop is faster

sushant

Shan

Jens.Toerring

EventHelix.com

Neo

Neo

infobahn

xarax

Lew Pitcher

KiLVaiDeN

Jonathan Bartlett

Christian Bau

E. Robert Tisdale

Jack Klein

Stan Milam

Neo

Nick Keighley

petermcmillan_uk

Lawrence Kirby

Grumble

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads