SM Ryan said:
# #define DUFF_DEVICE_8(macroCount, macroAction) \
With a modern, optimising compiler, it's bad idea.
As with most widely known programming techniques, whether
it makes sense to employ Duff's Device in some particular
circumstances depends on the circumstances.
Compilers can do unrolling for you.
The question, however, usually is not whether some
hypothetical compiler *can* but whether some actual compiler
*does*. These questions often have different answers.
Duff's Device is going to make the
code so complicated that it will prevent the compiler from
doing a number of additional optimisations.
Use of Duff's Device may complicate the code to the point
where it *might* prevent the compiler from doing additional
useful optimizations, but there's no guarantee of that. At
this point Duff's Device is well known enough so advanced
compilers using structural analysis to do their flow
analysis may very well recognize the particular form of
multiple-entry loop that Duff's Device uses and deal with it
appropriately.
More significantly, the loop body that is being unrolled is
usually very simple (otherwise why are we bothering to
unroll it?); often times it won't be improved beyond what
is already done in the multiple-entry loop form.
If you insist
on unrolling your loops, do something like
for (i=0; i+8<n; i+=8) {
body(i+0);
body(i+1);
body(i+2);
body(i+3);
body(i+4);
body(i+5);
body(i+6);
body(i+7);
}
switch (n-i) {
case 7: body(i); i++;
case 6: body(i); i++;
case 5: body(i); i++;
case 4: body(i); i++;
case 3: body(i); i++;
case 2: body(i); i++;
case 1: body(i); i++;
}
This leaves the aggregate loop body in an optimisable form.
Minor correction - the code shown is wrong: if n is a
multiple of eight (and greater than zero), n-8 cases are
done rather than n cases. (There are at least two easy
fixes, left as exercises for the reader.)
As noted before, often times the compiler won't be able
to optimize the code in the 'for' beyond what it would
do in the corresponding expression in Duff's Device.
The technique of incrementing i by 8, and using 'i+0', etc,
in the unrolled loop body, is a good one to know; however,
this technique can also be used with Duff's Device:
switch( i = n % 8 ) do {
i += 8;
body(i-8);
case 7: body(i-7);
case 6: body(i-6);
case 5: body(i-5);
case 4: body(i-4);
case 3: body(i-3);
case 2: body(i-2);
case 1: body(i-1);
case 0: ;
} while( i < n );
If you compare the generated code for the two approaches, I
expect you'll find cases where the generated code for the
unrolled-using-switch approach is better than the generated
code for the for-then-switch approach, along every important
axis. (That is what I found with some simple loop 'body's.)
Certainly it doesn't make sense to use Duff's Device in all
cases. Most commonly, loops shouldn't be unrolled at all,
because the benefit that might come from unrolling just
isn't significant (and may very well be negative rather than
positive). But when it is important to unroll a loop,
Duff's Device is one possible approach to do that; its
use should be considered, compared and evaluated along with
any other alternatives. Other things being equal, code that
runs faster and has fewer lines of code seems like a better
choice. In cases where Duff's Device produces such code,
there is a pretty strong argument that it's the right
approach to use in those circumstances.
In short, I don't think Duff's Device is either always good
or always bad. It's just a technique to know and compare
against other possible approaches; whether it should be
used or not depends on how it stacks up against the other
possibilities, in the particular circumstances being
investigated.