question

R

ranjeet.gupta

Dear Readers !!

What does C exactly means for loop unrolling ?

Thanks in Advance
Ranjeet
 
R

Robert Gamble

Dear Readers !!

What does C exactly means for loop unrolling ?

Loop unrolling is a technique used by compilers to reduce the number of
iterations though of a loop construct by duplicating the body of the
loop with the goal of improving speed and producing the potential for
other optimizations such as pipelining. This is not specific to C, or
part of the Standard, and is off-topic here. Check out
http://en.wikipedia.org/wiki/Loop_unrolling or use google for more
details.

Robert Gamble
 
S

SM Ryan

(e-mail address removed) wrote:
# Dear Readers !!
#
# What does C exactly means for loop unrolling ?

I don't know anything specific about C unless you want to use Duff's device, but
loop unrolling refers to replicating the body of a loop and reducing the total
number of iterations.

for (i=0; i<n; i++) {
body(i);
}

can be unrolled by a factor of 4 to

for (i=0; i+3<n; i+=4) {
body(i+0);
body(i+1);
body(i+2);
body(i+3);
}
for (; i<n; i++) {
body(i); // last 1 to 3 iterations.
}

This can reduce loop overhead and can open new optimisation possibilities
if the individual instructions of body(i+0);...;body(i+3); can be
intermingled.


For example if memcpy knows it's moving word aligned source and destination
on 4 byte words

for (i=0; i<n; i++) dd = ss;

can be unrolled to

for (i=0; i+3<n; i+=4) {
dd[i+0] = ss[i+0];
dd[i+1] = ss[i+1];
dd[i+2] = ss[i+2];
dd[i+3] = ss[i+3];
}
for (; i<n; i++) dd = ss;

and might then be optimised to

if ((dd&3)==0 && (ss&3)==0) {
for (i=0; i+3<n; i+=4) {
*((int*)(dd+i)) = *((int*)(ss+i));
}
}else if ((dd&1)==0 && (ss&1)==0) {
for (i=0; i+1<n; i+=2) {
*((short*)(dd+i)) = *((short*)(ss+i));
}
}else {
for (i=0; i+3<n; i+=4) {
dd[i+0] = ss[i+0];
dd[i+1] = ss[i+1];
dd[i+2] = ss[i+2];
dd[i+3] = ss[i+3];
}
}
for (; i<n; i++) dd = ss;

and run up to four times as fast.
 
S

SM Ryan

# other optimizations such as pipelining. This is not specific to C, or
# part of the Standard, and is off-topic here. Check out

http://catb.org/~esr/jargon/html/D/Duffs-device.html
quote


register n = (count + 7) / 8; /* count > 0 assumed */
switch (count % 8)
case 0: do { *to++ = *from++;
case 7: *to++ = *from++;
case 6: *to++ = *from++;
case 5: *to++ = *from++;
case 4: *to++ = *from++;
case 3: *to++ = *from++;
case 2: *to++ = *from++;
case 1: *to++ = *from++;
} while (--n > 0);

Shocking though it appears to all who encounter it for the first time,
the device is actually perfectly valid, legal C. C's default fall
through in case statements has long been its most controversial
single feature; Duff observed that Ã’This code forms some sort of
argument in that debate, but I'm not sure whether it's for or against.Ó
 
R

ranjeet.gupta

SM said:
(e-mail address removed) wrote:
# Dear Readers !!
#
# What does C exactly means for loop unrolling ?

I don't know anything specific about C unless you want to use Duff's device, but
loop unrolling refers to replicating the body of a loop and reducing the total
number of iterations.

for (i=0; i<n; i++) {
body(i);
}

can be unrolled by a factor of 4 to

So it means that the loop unrooling is the compiler dependent ?
means varry from compiler to compiler, Leading to the various
factors of unrolled i.e may be 3, may be 4, may be 5 ..on
diffrent diffrent compilers ??
for (i=0; i+3<n; i+=4) {
body(i+0);
body(i+1);
body(i+2);
body(i+3);
}
for (; i<n; i++) {
body(i); // last 1 to 3 iterations.
}

This can reduce loop overhead and can open new optimisation possibilities
if the individual instructions of body(i+0);...;body(i+3); can be
intermingled.


For example if memcpy knows it's moving word aligned source and destination
on 4 byte words

for (i=0; i<n; i++) dd = ss;

can be unrolled to

for (i=0; i+3<n; i+=4) {
dd[i+0] = ss[i+0];
dd[i+1] = ss[i+1];
dd[i+2] = ss[i+2];
dd[i+3] = ss[i+3];
}
for (; i<n; i++) dd = ss;

and might then be optimised to

if ((dd&3)==0 && (ss&3)==0) {
for (i=0; i+3<n; i+=4) {
*((int*)(dd+i)) = *((int*)(ss+i));
}
}else if ((dd&1)==0 && (ss&1)==0) {
for (i=0; i+1<n; i+=2) {
*((short*)(dd+i)) = *((short*)(ss+i));
}
}else {
for (i=0; i+3<n; i+=4) {
dd[i+0] = ss[i+0];
dd[i+1] = ss[i+1];
dd[i+2] = ss[i+2];
dd[i+3] = ss[i+3];
}
}
for (; i<n; i++) dd = ss;

and run up to four times as fast.
 
R

ranjeet.gupta

SM said:
(e-mail address removed) wrote:
# Dear Readers !!
#
# What does C exactly means for loop unrolling ?

I don't know anything specific about C unless you want to use Duff's device, but
loop unrolling refers to replicating the body of a loop and reducing the total
number of iterations.

for (i=0; i<n; i++) {
body(i);
}

can be unrolled by a factor of 4 to

So it means that the loop unrooling is the compiler dependent ?
means varry from compiler to compiler, Leading to the various
factors of unrolled i.e may be 3, may be 4, may be 5 ..on
diffrent diffrent compilers ??
for (i=0; i+3<n; i+=4) {
body(i+0);
body(i+1);
body(i+2);
body(i+3);
}
for (; i<n; i++) {
body(i); // last 1 to 3 iterations.
}

This can reduce loop overhead and can open new optimisation possibilities
if the individual instructions of body(i+0);...;body(i+3); can be
intermingled.


For example if memcpy knows it's moving word aligned source and destination
on 4 byte words

for (i=0; i<n; i++) dd = ss;

can be unrolled to

for (i=0; i+3<n; i+=4) {
dd[i+0] = ss[i+0];
dd[i+1] = ss[i+1];
dd[i+2] = ss[i+2];
dd[i+3] = ss[i+3];
}
for (; i<n; i++) dd = ss;

and might then be optimised to

if ((dd&3)==0 && (ss&3)==0) {
for (i=0; i+3<n; i+=4) {
*((int*)(dd+i)) = *((int*)(ss+i));
}
}else if ((dd&1)==0 && (ss&1)==0) {
for (i=0; i+1<n; i+=2) {
*((short*)(dd+i)) = *((short*)(ss+i));
}
}else {
for (i=0; i+3<n; i+=4) {
dd[i+0] = ss[i+0];
dd[i+1] = ss[i+1];
dd[i+2] = ss[i+2];
dd[i+3] = ss[i+3];
}
}
for (; i<n; i++) dd = ss;

and run up to four times as fast.
 
R

Robert Gamble

So it means that the loop unrooling is the compiler dependent ?
means varry from compiler to compiler, Leading to the various
factors of unrolled i.e may be 3, may be 4, may be 5 ..on
diffrent diffrent compilers ??

Some compilers may make use of loop unrolling while others might not.
Those that do use the technique may do so differently from other
compilers that use it. The Standard says nothing about this which is
why it is off-topic here.

Robert Gamble
 
S

SM Ryan

(e-mail address removed) wrote:

# So it means that the loop unrooling is the compiler dependent ?

Usually. Sometimes people do it by hand. See also Duff's Device.

# means varry from compiler to compiler, Leading to the various
# factors of unrolled i.e may be 3, may be 4, may be 5 ..on
# diffrent diffrent compilers ??

Yes. Whether it helps or hurts depends on the compiler, CPU, cache, etc.
 
K

Keith Thompson

So it means that the loop unrooling is the compiler dependent ?
means varry from compiler to compiler, Leading to the various
factors of unrolled i.e may be 3, may be 4, may be 5 ..on
diffrent diffrent compilers ??

The compiler's job is to generate code that behaves in the way
specified by source code as controlled by the language definition. If
a compiler unrolls a loop, the resulting unrolled loop has to behave
the same way it would have without the optimization (except for
variations, such as unspecified behavior, allowed by the language).

Within those constraints, the compiler can do anything it likes. The
language says nothing whatsoever about loop unrolling; it's just one
of many kinds of optimizations that a compiler can perform. The
unrolling factor can and will vary from compiler to compiler, and with
a given compiler depending on the nature of the loop, the nature of
the surrounding code, any command-line options that control
optimization, and, if the compiler writer is in a silly mood, the
phase of the moon.
 
T

Tim Rentsch

SM Ryan said:
# other optimizations such as pipelining. This is not specific to C, or
# part of the Standard, and is off-topic here. Check out

http://catb.org/~esr/jargon/html/D/Duffs-device.html
quote


register n = (count + 7) / 8; /* count > 0 assumed */
switch (count % 8)
case 0: do { *to++ = *from++;
case 7: *to++ = *from++;
case 6: *to++ = *from++;
case 5: *to++ = *from++;
case 4: *to++ = *from++;
case 3: *to++ = *from++;
case 2: *to++ = *from++;
case 1: *to++ = *from++;
} while (--n > 0);

Shocking though it appears to all who encounter it for the first time,
the device is actually perfectly valid, legal C. C's default fall
through in case statements has long been its most controversial
single feature; Duff observed that Ã’This code forms some sort of
argument in that debate, but I'm not sure whether it's for or against.Ó

A slightly different way of coding Duff's device:

n = count / 8;
switch( count % 8 ) do {
*to++ = *from++; case 7:
*to++ = *from++; case 6:
*to++ = *from++; case 5:
*to++ = *from++; case 4:
*to++ = *from++; case 3:
*to++ = *from++; case 2:
*to++ = *from++; case 1:
*to++ = *from++; case 0:;
} while( n-- > 0 );

Writing the 'case 0:' label at the end rather than the beginning
allows the code to work correctly when count >= 0 rather than just
when count > 0.

Putting the case labels on the right rather than the left makes
the code look even more bizarre than the regular Duff's Device;
I'm not sure whether to count that as a plus or a minus. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,142
Latest member
arinsharma
Top