M
Michael Schumacher
Hello all,
we recently had some discussion about "Duff's Device" and which code
a modern, optimizing compiler should produce for it. In case you
never heard of "Duff's Device", you can read all about it here:
<www.lysator.liu.se/c/duffs-device.html>. In short, the author of
this code needed some efficient piece of code to write the contents
of an array of "short"s to a video hardware register. Now, given
the limitations of his C compiler at that time (1983!), he came up
with the following amusing solution, which uses loop-unrolling in a
very "interesting" way:
send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8){
case 0: do{ *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
}while(--n>0);
}
}
Now, our discussion revolved around the question whether a "good"
compiler would be able (and was "allowed" to, given the C semantics)
to reduce this code to just: "*to = from[count - 1];".
I think it must _not_ do that (actually, gcc 4.3 doesn't do it, at
least not for x86 targets), but it's actually pretty hard to prove
it. Of course, you can nowadays ISOfy the code and declare "to" as
being "volatile", but I can't see how this would change the basic
questions, which are:
a) given the "usual" (K&R, ANSI, ISO) C semantics, is it allowed
for an optimizing compiler to reduce "Duff's Device" simply to
"*to = from[count - 1];"? yes ? why : !why ;-)
b) if "yes", how can you reliably write such a "send" routine in
pure C that does what it's supposed to do, and does "volatile"
help in this regard, given that "to" is already declared
"register" (both are mutually exclusive, right?)?
Thank you very much in advance,
mike
we recently had some discussion about "Duff's Device" and which code
a modern, optimizing compiler should produce for it. In case you
never heard of "Duff's Device", you can read all about it here:
<www.lysator.liu.se/c/duffs-device.html>. In short, the author of
this code needed some efficient piece of code to write the contents
of an array of "short"s to a video hardware register. Now, given
the limitations of his C compiler at that time (1983!), he came up
with the following amusing solution, which uses loop-unrolling in a
very "interesting" way:
send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8){
case 0: do{ *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
}while(--n>0);
}
}
Now, our discussion revolved around the question whether a "good"
compiler would be able (and was "allowed" to, given the C semantics)
to reduce this code to just: "*to = from[count - 1];".
I think it must _not_ do that (actually, gcc 4.3 doesn't do it, at
least not for x86 targets), but it's actually pretty hard to prove
it. Of course, you can nowadays ISOfy the code and declare "to" as
being "volatile", but I can't see how this would change the basic
questions, which are:
a) given the "usual" (K&R, ANSI, ISO) C semantics, is it allowed
for an optimizing compiler to reduce "Duff's Device" simply to
"*to = from[count - 1];"? yes ? why : !why ;-)
b) if "yes", how can you reliably write such a "send" routine in
pure C that does what it's supposed to do, and does "volatile"
help in this regard, given that "to" is already declared
"register" (both are mutually exclusive, right?)?
Thank you very much in advance,
mike