Question about C semantics/optimizing C compilers

  • Thread starter Michael Schumacher
  • Start date
M

Michael Schumacher

Hello all,

we recently had some discussion about "Duff's Device" and which code
a modern, optimizing compiler should produce for it. In case you
never heard of "Duff's Device", you can read all about it here:
<www.lysator.liu.se/c/duffs-device.html>. In short, the author of
this code needed some efficient piece of code to write the contents
of an array of "short"s to a video hardware register. Now, given
the limitations of his C compiler at that time (1983!), he came up
with the following amusing solution, which uses loop-unrolling in a
very "interesting" way:

send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8){
case 0: do{ *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
}while(--n>0);
}
}

Now, our discussion revolved around the question whether a "good"
compiler would be able (and was "allowed" to, given the C semantics)
to reduce this code to just: "*to = from[count - 1];".

I think it must _not_ do that (actually, gcc 4.3 doesn't do it, at
least not for x86 targets), but it's actually pretty hard to prove
it. Of course, you can nowadays ISOfy the code and declare "to" as
being "volatile", but I can't see how this would change the basic
questions, which are:

a) given the "usual" (K&R, ANSI, ISO) C semantics, is it allowed
for an optimizing compiler to reduce "Duff's Device" simply to
"*to = from[count - 1];"? yes ? why : !why ;-)

b) if "yes", how can you reliably write such a "send" routine in
pure C that does what it's supposed to do, and does "volatile"
help in this regard, given that "to" is already declared
"register" (both are mutually exclusive, right?)?


Thank you very much in advance,
mike
 
U

user923005

Hello all,

we recently had some discussion about "Duff's Device" and which code
a modern, optimizing compiler should produce for it.  In case you
never heard of "Duff's Device", you can read all about it here:
<www.lysator.liu.se/c/duffs-device.html>.  In short, the author of
this code needed some efficient piece of code to write the contents
of an array of "short"s to a video hardware register.  Now, given
the limitations of his C compiler at that time (1983!), he came up
with the following amusing solution, which uses loop-unrolling in a
very "interesting" way:

        send(to, from, count)
        register short *to, *from;
        register count;
        {
                register n=(count+7)/8;
                switch(count%8){
                case 0: do{     *to = *from++;
                case 7:         *to = *from++;
                case 6:         *to = *from++;
                case 5:         *to = *from++;
                case 4:         *to = *from++;
                case 3:         *to = *from++;
                case 2:         *to = *from++;
                case 1:         *to = *from++;
                        }while(--n>0);
                }
        }

Now, our discussion revolved around the question whether a "good"
compiler would be able (and was "allowed" to, given the C semantics)
to reduce this code to just: "*to = from[count - 1];".

That's not what Duff's device does. It is a loop unrolled memset.
And yes, some compilers do loop unrolling.
I think it must _not_ do that (actually, gcc 4.3 doesn't do it, at
least not for x86 targets), but it's actually pretty hard to prove
it.  Of course, you can nowadays ISOfy the code and declare "to" as
being "volatile", but I can't see how this would change the basic
questions, which are:

   a) given the "usual" (K&R, ANSI, ISO) C semantics, is it allowed
      for an optimizing compiler to reduce "Duff's Device" simply to
      "*to = from[count - 1];"?  yes ? why : !why  ;-)

No, because of the "as-if" rule.
   b) if "yes", how can you reliably write such a "send" routine in
      pure C that does what it's supposed to do, and does "volatile"
      help in this regard, given that "to" is already declared
      "register" (both are mutually exclusive, right?)?

In theory, a compiler could turn either one of these into the other:

void loopcopy(long *to, long *from, long count);
void duffcopy(long *to, long *from, long count);

void loopcopy(long *to, long *from, long count)
{
do
*to = *from++;
while (--count > 0);
return;
}

void duffcopy(long *to, long *from, long count)
{
register n = (count + 7) / 8;
switch (count % 8)
{
case 0:
do
{
*to = *from++;
case 7:
*to = *from++;
case 6:
*to = *from++;
case 5:
*to = *from++;
case 4:
*to = *from++;
case 3:
*to = *from++;
case 2:
*to = *from++;
case 1:
*to = *from++;
}
while (--n > 0);
}
return;
}

If you have a variable that is volatile, then the compiler cannot make
many optimizing assumptions concerning that variable (for instance,
its contents might be set by an I/O port).
 
K

Keith Thompson

user923005 said:
we recently had some discussion about "Duff's Device" and which code
a modern, optimizing compiler should produce for it.  In case you
never heard of "Duff's Device", you can read all about it here:
<www.lysator.liu.se/c/duffs-device.html>.  In short, the author of
this code needed some efficient piece of code to write the contents
of an array of "short"s to a video hardware register.  Now, given
the limitations of his C compiler at that time (1983!), he came up
with the following amusing solution, which uses loop-unrolling in a
very "interesting" way:

        send(to, from, count)
        register short *to, *from;
        register count;
        {
                register n=(count+7)/8;
                switch(count%8){
                case 0: do{     *to = *from++;
                case 7:         *to = *from++;
                case 6:         *to = *from++;
                case 5:         *to = *from++;
                case 4:         *to = *from++;
                case 3:         *to = *from++;
                case 2:         *to = *from++;
                case 1:         *to = *from++;
                        }while(--n>0);
                }
        }

Now, our discussion revolved around the question whether a "good"
compiler would be able (and was "allowed" to, given the C semantics)
to reduce this code to just: "*to = from[count - 1];".

That's not what Duff's device does. It is a loop unrolled memset.
And yes, some compilers do loop unrolling.

No, the original Duff's Device was very much like what Michael posted.
The target pointer doesn't advance; it points to a memory-mapped
device output register. (Changing "*to" to "*to++" does give you an
unrolled memset.)

See question 20.35 in the comp.lang.c FAQ, <http://www.c-faq.com/>.

[...]
 
M

Michael Schumacher

user923005 said:
[code of "Duff's Device"]
Now, our discussion revolved around the question whether a "good"
compiler would be able (and was "allowed" to, given the C semantics)
to reduce this code to just: "*to = from[count - 1];".

That's not what Duff's device does. It is a loop unrolled memset.

Not really. memset() writes a constant value to a given memory area,
whereas "Duff's Device" writes the contents of an array to a constant
memory address (which happens to be a hardware register).
[May a compiler reduce Duff's Device to "*to = from[count - 1];"?]
I think it must _not_ do that (actually, gcc 4.3 doesn't do it, at
least not for x86 targets), but it's actually pretty hard to prove
it.  Of course, you can nowadays ISOfy the code and declare "to" as
being "volatile", but I can't see how this would change the basic
questions, which are:

a) given the "usual" (K&R, ANSI, ISO) C semantics, is it allowed
for an optimizing compiler to reduce "Duff's Device" simply to
"*to = from[count - 1];"?  yes ? why : !why  ;-)

No, because of the "as-if" rule.

I can farly remember this rule (any pointers to the ISO-C doc?), but
IIRC it just says that a compiler may rearrange the code in any way
it sees fit, as long as the /output/ of the program behaves as expected;
I'm all but sure that writing through a pointer qualifies as "output"...
void loopcopy(long *to, long *from, long count);
void duffcopy(long *to, long *from, long count);

void loopcopy(long *to, long *from, long count)
{
do
*to = *from++;
while (--count > 0);
return;
}

I'd rather write that as "while (count--) {...}", but okay so far.
void duffcopy(long *to, long *from, long count)
[Implements "Duff's Device" as initially posted]

But, actually. this doesn't answer my original questions:

- is it okay for an optimizing C compiler to reduce the code of
"Duff's Device" or, *especially*, your loopcopy() function above
to "*to = from[count - 1];" (please explain exactly why/why not),
despite the *intended* semantics of the code?

- if it is okay, then how can you safely implement "Duff's Device",
preserving its intended semantics (and does "volatile" help)?

I hope my questions are somewhat clearer now... ;-) Nevertheless,
thanks for your answer, which at least seems to confirm my opinion.


mike
 
W

Willem

Michael Schumacher wrote:
) <snip>
) and does "volatile"
) help in this regard, given that "to" is already declared
) "register" (both are mutually exclusive, right?)?

NB: "to", in this case, is declared as: register short *to

They are most certainly not mutually exclusive.
You see, the 'register' keyword applies to the pointer itself.
You want the 'volatile' keyword to apply to whatever it points *to*.

So, that would be: register short * volatile to
(Perhaps it's different, correct me if I'm wrong).


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
J

James Kuyper

Michael said:
user923005 said:
[May a compiler reduce Duff's Device to "*to = from[count - 1];"?]
I think it must _not_ do that (actually, gcc 4.3 doesn't do it, at
least not for x86 targets), but it's actually pretty hard to prove
it. Â Of course, you can nowadays ISOfy the code and declare "to" as
being "volatile", but I can't see how this would change the basic
questions, which are:

a) given the "usual" (K&R, ANSI, ISO) C semantics, is it allowed
for an optimizing compiler to reduce "Duff's Device" simply to
"*to = from[count - 1];"?  yes ? why : !why  ;-)
No, because of the "as-if" rule.

I can farly remember this rule (any pointers to the ISO-C doc?), ...

The "as-if" rule is a nick-name for 5.1.2.3p5. See the message I posted
about it to this newsgroup on 2009-06-23 with the title "Optimizing
compiler reordering statements".
> ... but
IIRC it just says that a compiler may rearrange the code in any way
it sees fit, as long as the /output/ of the program behaves as expected;
I'm all but sure that writing through a pointer qualifies as "output"...

Duff's device was invented before 'volatile' was. In modern terms, the
output pointer should have been declared 'volatile long*'. With that
modification, your "optimization" would no longer be legal. This is
because, at each sequence point, "previous accesses are complete and
subsequent accesses have not yet occurred." Duff's device has at least
one sequence point between every pair of accesses to *to, so each of
those accesses must in fact occur separately.
 
H

Hallvard B Furuseth

Michael said:
Hello all,

we recently had some discussion about "Duff's Device" (...)
send(to, from, count)
register short *to, *from;
register count;
(...)

Now, our discussion revolved around the question whether a "good"
compiler would be able (and was "allowed" to, given the C semantics)
to reduce this code to just: "*to = from[count - 1];".

I think it must _not_ do that (actually, gcc 4.3 doesn't do it, at
least not for x86 targets), but it's actually pretty hard to prove
it.

Because it's wrong for count <= 0, where the Duff's Device code does
not do what you want, and when (to == &from[count - 1] && count > 1).

In the latter case the last assignment does *to = *to, so you need to
get from[count-2] instead. You could optimize to this however:
if (count <=0) {
original code - well, can still be shortened but i won't bother;
} else {
*to = from[count - 1 - (to == &from[count-1] && count > 1)];
}

You could use the 'restrict' keyword to tell the compiler
to == &from[count - 1] won't happen:
void send(short *restrict to, short *restrict from, int count) { .. }

Anyway, what's needed is that the compiler can prove the effect would be
the same. Which does not seem likely a compiler could do for the
restrict-less variant at least, it doesn't look like a normal case.

gcc -O2 -std=c99 optimizes away the 1st assignment here:

void send(short *restrict to, short *restrict from)
{
*to = from[0];
*to = from[1];
}
 
B

Ben Bacarisse

Willem said:
Michael Schumacher wrote:
) <snip>
) and does "volatile"
) help in this regard, given that "to" is already declared
) "register" (both are mutually exclusive, right?)?

NB: "to", in this case, is declared as: register short *to

They are most certainly not mutually exclusive.

Quite. One (register) is a storage class specifier and the other
(volatile) is a type qualifier.
You see, the 'register' keyword applies to the pointer itself.
You want the 'volatile' keyword to apply to whatever it points *to*.

So, that would be: register short * volatile to
(Perhaps it's different, correct me if I'm wrong).

That makes 'to' volatile. To make what it points to volatile you need
the keyword before the star:

volatile short *to;

A register storage class specifier can also be added. This applies to
the whole declaration i.e. it makes 'to' have register storage class:

register volatile short *to;

[The storage class specifiers, the type qualifiers and the type
specifier ('short' in this case) can be intermingled when they both
occur at the start or the declaration (so you can write 'short
volatile register *to;' if you want). Later on, the declarator (the
bit that eventually has the name in it) can include other type
qualifiers as in your example where 'to' itself is made volatile.]
 
M

Michael Schumacher

James said:
Michael said:
user923005 said:
a) given the "usual" (K&R, ANSI, ISO) C semantics, is it allowed
for an optimizing compiler to reduce "Duff's Device" simply to
"*to = from[count - 1];"?  yes ? why : !why  ;-)
No, because of the "as-if" rule.

I can farly remember this rule (any pointers to the ISO-C doc?), ...

The "as-if" rule is a nick-name for 5.1.2.3p5. See the message I posted
about it to this newsgroup on 2009-06-23 with the title "Optimizing
compiler reordering statements".

Ah, thanks!
Duff's device was invented before 'volatile' was. In modern terms, the
output pointer should have been declared 'volatile long*'.

That should probably read "volatile short *" (16-bit I/O register), no?
With that
modification, your "optimization" would no longer be legal. This is
because, at each sequence point, "previous accesses are complete and
subsequent accesses have not yet occurred." Duff's device has at least
one sequence point between every pair of accesses to *to, so each of
those accesses must in fact occur separately.

So, IOW an optimizing compiler *could* "rewrite" Duff's Device as
"*to = from[count - 1];", *unless* "to" is declared volatile?

As someone already remarked, "count == 0" needs special treatment,
so it should at least be "if (count) *to = from[count - 1];", but I'm
still not convinced that a compiler is allowed to do that optimization.
"to" and "from" are not restricted pointers, then there is that special
"fall throu" semantics of the case branches, which imply sequence points
in between them.,,

I think it's not even okay for an optimizing compiler to reduce, e.g.

send (register short count, register short *to, register short *from)
{
while (count--)
*to = *from++;
}

to "if (count) *to = from[count - 1];" -- I've checked both gcc and icc,
and they actually don't perform this optimization. Just coincidence?


mike
 
J

James Kuyper

Michael said:
That should probably read "volatile short *" (16-bit I/O register), no?

I got the 'long' from user923005's definition of 'duffbody', which you
quoted in your message. I should have looked farther back.
With that
modification, your "optimization" would no longer be legal. This is
because, at each sequence point, "previous accesses are complete and
subsequent accesses have not yet occurred." Duff's device has at least
one sequence point between every pair of accesses to *to, so each of
those accesses must in fact occur separately.

So, IOW an optimizing compiler *could* "rewrite" Duff's Device as
"*to = from[count - 1];", *unless* "to" is declared volatile?

That's my understanding.
As someone already remarked, "count == 0" needs special treatment,
so it should at least be "if (count) *to = from[count - 1];", but I'm
still not convinced that a compiler is allowed to do that optimization.
"to" and "from" are not restricted pointers, then there is that special
"fall throu" semantics of the case branches, which imply sequence points
in between them.,,

The key point is that (without volatile) the observable behavior of the
optimized code would be exactly the same as is required by the C
standard. The point of 5.1.2.3p5 is that it's only the observable
behavior that matters, not how it is achieved (though, as I
acknowledged in my message on the other thread, the standard doesn't
actually use the term "observable" to explain the rule).
I think it's not even okay for an optimizing compiler to reduce, e.g.

send (register short count, register short *to, register short *from)
{
while (count--)
*to = *from++;
}

to "if (count) *to = from[count - 1];" -- I've checked both gcc and icc,
and they actually don't perform this optimization. Just coincidence?

Because the simpler form is equivalent to the original one, such code is
almost always the result of an error, or a deliberate attempt to write a
busy-wait loop (which, since the optimization is permitted, is also
arguably an error). Therefore, performing the optimization either
carries no real benefit, or removes the whole point of the original
code. Therefore, I wouldn't attach any special significance to the fact
that any specific compilers doesn't perform it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top