optimizing the switch statement in Duff's Device (casting a label, label abuse)

A

anon.asdf

Here's a reminder of duff's device:

/*************************************/
#include <stdio.h>

#define STEP 8
#define MAX_LEN STEP*4+1
#define SOURCE_LEN 28


int main(void)
{
char source[SOURCE_LEN] =
"abcdefghijklmnopqrstuvwxyz0";
char destination[MAX_LEN] = {0};

char *src_ptr;
char *dest_ptr;
char *lim;

int user_input;

printf("Enter a number between incl. 1 and %d\n"
"To quit enter: 0<ENTER>\n", SOURCE_LEN-1);

while ((scanf("%d", &user_input) > 0)
&& (user_input > 0)
&& (user_input < SOURCE_LEN)) {

src_ptr = source;
lim = source + user_input;
dest_ptr = destination;

switch (user_input % 8) {
case 0: do {*dest_ptr++ = *src_ptr++;
case 7: *dest_ptr++ = *src_ptr++;
case 6: *dest_ptr++ = *src_ptr++;
case 5: *dest_ptr++ = *src_ptr++;
case 4: *dest_ptr++ = *src_ptr++;
case 3: *dest_ptr++ = *src_ptr++;
case 2: *dest_ptr++ = *src_ptr++;
case 1: *dest_ptr++ = *src_ptr++;
} while (src_ptr < lim);
}

#if 0 /*** alternative ***/
switch (user_input % 8) {
do {*dest_ptr++ = *src_ptr++;
case 7: *dest_ptr++ = *src_ptr++;
case 6: *dest_ptr++ = *src_ptr++;
case 5: *dest_ptr++ = *src_ptr++;
case 4: *dest_ptr++ = *src_ptr++;
case 3: *dest_ptr++ = *src_ptr++;
case 2: *dest_ptr++ = *src_ptr++;
case 1: *dest_ptr++ = *src_ptr++;
case 0: ; } while (src_ptr < lim);
}
#endif

*dest_ptr = '\0';
puts(destination);
}
return 0;
}


/*************************************/


The switch itself scales horribly. At worst we encounter 8 compare-
operations until we reach the desired case.

Maby there is a better way of doing the switch statement: In
particular - the argument to the switch evaluates to an integer

between 0 and 7 (incl.) and now the question is: could we not use this
number as an offset to immediately jump the code to

the desired location.

goto la_0 - (user_input % 8)
// * sizeof("*dest_ptr++ = *src_ptr++;"instruction)
;


do {*dest_ptr++ = *src_ptr++;
la_7: *dest_ptr++ = *src_ptr++;
la_6: *dest_ptr++ = *src_ptr++;
la_5: *dest_ptr++ = *src_ptr++;
la_4: *dest_ptr++ = *src_ptr++;
la_3: *dest_ptr++ = *src_ptr++;
la_2: *dest_ptr++ = *src_ptr++;
la_1: *dest_ptr++ = *src_ptr++;
la_0: ; } while (src_ptr < lim);


The code snippet above shows the idea:
we jump to an address relative to out pivot-label la_0.

Example
If (user_input % 8) == 1, we
want to jump to "la_0 minus a few instructions" to land up at la_1.


Can something like this be done?
Can we get a label's address and subtract (or add) an integer to it?
Or are there other alternatives that don't need a label...?

Of course, there are issues involved:
What if the compiler thinks it's "clever" :) ?
what does the compiler do with the empty ; located at la_0 (of course
we do not want a nop)?

Thanks a thousand... I'm getting a cold duff from the fridge!

Kind regards,
Albert
 
J

J. J. Farrell

Here's a reminder of duff's device:

/*************************************/
#include <stdio.h>

#define STEP 8
#define MAX_LEN STEP*4+1
#define SOURCE_LEN 28

int main(void)
{
char source[SOURCE_LEN] =
"abcdefghijklmnopqrstuvwxyz0";
char destination[MAX_LEN] = {0};

char *src_ptr;
char *dest_ptr;
char *lim;

int user_input;

printf("Enter a number between incl. 1 and %d\n"
"To quit enter: 0<ENTER>\n", SOURCE_LEN-1);

while ((scanf("%d", &user_input) > 0)
&& (user_input > 0)
&& (user_input < SOURCE_LEN)) {

src_ptr = source;
lim = source + user_input;
dest_ptr = destination;

switch (user_input % 8) {
case 0: do {*dest_ptr++ = *src_ptr++;
case 7: *dest_ptr++ = *src_ptr++;
case 6: *dest_ptr++ = *src_ptr++;
case 5: *dest_ptr++ = *src_ptr++;
case 4: *dest_ptr++ = *src_ptr++;
case 3: *dest_ptr++ = *src_ptr++;
case 2: *dest_ptr++ = *src_ptr++;
case 1: *dest_ptr++ = *src_ptr++;
} while (src_ptr < lim);
}

#if 0 /*** alternative ***/
switch (user_input % 8) {
do {*dest_ptr++ = *src_ptr++;
case 7: *dest_ptr++ = *src_ptr++;
case 6: *dest_ptr++ = *src_ptr++;
case 5: *dest_ptr++ = *src_ptr++;
case 4: *dest_ptr++ = *src_ptr++;
case 3: *dest_ptr++ = *src_ptr++;
case 2: *dest_ptr++ = *src_ptr++;
case 1: *dest_ptr++ = *src_ptr++;
case 0: ; } while (src_ptr < lim);
}
#endif

*dest_ptr = '\0';
puts(destination);
}
return 0;

}

/*************************************/

The switch itself scales horribly. At worst we encounter 8 compare-
operations until we reach the desired case.

Really? What compiler are you using as a matter of interest? It must
be a pretty poor quality one.
Maby there is a better way of doing the switch statement: In
particular - the argument to the switch evaluates to an integer
between 0 and 7 (incl.) and now the question is: could we not use this
number as an offset to immediately jump the code to
the desired location.

The compiler could obviously do it, that's what tools are for. I'm
interested to know what compiler you've got which doesn't do what you
say for the code given above.
goto la_0 - (user_input % 8)
// * sizeof("*dest_ptr++ = *src_ptr++;"instruction)
;

do {*dest_ptr++ = *src_ptr++;
la_7: *dest_ptr++ = *src_ptr++;
la_6: *dest_ptr++ = *src_ptr++;
la_5: *dest_ptr++ = *src_ptr++;
la_4: *dest_ptr++ = *src_ptr++;
la_3: *dest_ptr++ = *src_ptr++;
la_2: *dest_ptr++ = *src_ptr++;
la_1: *dest_ptr++ = *src_ptr++;
la_0: ; } while (src_ptr < lim);

The code snippet above shows the idea:
we jump to an address relative to out pivot-label la_0.

Example
If (user_input % 8) == 1, we
want to jump to "la_0 minus a few instructions" to land up at la_1.

Can something like this be done?

Not easily or efficiently in C.
Can we get a label's address and subtract (or add) an integer to it?
Or are there other alternatives that don't need a label...?

Use Duff's device and a decent compiler.
Of course, there are issues involved:
What if the compiler thinks it's "clever" :) ?
what does the compiler do with the empty ; located at la_0 (of course
we do not want a nop)?

Thanks a thousand... I'm getting a cold duff from the fridge!

I'd track down a decent compiler which does the sensible thing with
Duff's device before spending time trying to hack up some alternative.
An old gcc on x86 does the obvious thing correctly, for example.
 
D

Duncan Muirhead

On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:

The switch itself scales horribly. At worst we encounter 8 compare-
operations until we reach the desired case.
Can we get a label's address and subtract (or add) an integer to it?
Or are there other alternatives that don't need a label...?

Of course, there are issues involved:
What if the compiler thinks it's "clever" :) ?
what does the compiler do with the empty ; located at la_0 (of course
we do not want a nop)?

Thanks a thousand... I'm getting a cold duff from the fridge!

Kind regards,
Albert
You shouldn't assume that a switch is always implemented by a
sequence of comparisons; it might well be implemented as a jump table.

What you seek can be done with gcc, using the labels as values
extension, (you can store the labels in an array L say, and goto
L[user_output%8]) but that isn't (standard) C.
 
B

Ben Bacarisse

Here's a reminder of duff's device:
switch (user_input % 8) {
case 0: do {*dest_ptr++ = *src_ptr++;
case 7: *dest_ptr++ = *src_ptr++;
case 6: *dest_ptr++ = *src_ptr++;
case 5: *dest_ptr++ = *src_ptr++;
case 4: *dest_ptr++ = *src_ptr++;
case 3: *dest_ptr++ = *src_ptr++;
case 2: *dest_ptr++ = *src_ptr++;
case 1: *dest_ptr++ = *src_ptr++;
} while (src_ptr < lim);
}
The switch itself scales horribly. At worst we encounter 8 compare-
operations until we reach the desired case.

Maby there is a better way of doing the switch statement: In
particular - the argument to the switch evaluates to an integer

between 0 and 7 (incl.) and now the question is: could we not use this
number as an offset to immediately jump the code to

Yes. Most compilers will do this for you simply by giving it the
above code. A compiler might choose to use repeated compare
operations, but it seems like any unlikely choice in this case. If
yours does, I'd look to change it rather than venture into
non-standard "computed goto" territory (not that I am saying any
compilers support it, but if they did, I would avoid it).
 
A

anon.asdf

On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:

<snip>









You shouldn't assume that a switch is always implemented by a
sequence of comparisons; it might well be implemented as a jump table.


Ah great! Thanks for the info.
(I was mislead by reading http://www.geekronomicon.com/?q=node/68#switch).

What you seek can be done with gcc, using the labels as values
extension, (you can store the labels in an array L say, and goto
L[user_output%8]) but that isn't (standard) C.

I'm using gcc, so I'll keep that in mind in case I need to do
something more exotic than a switch, one day.


Kind regards,
Albert
 
C

christian.bau

The whole thing is completely stupid.

1. If you want to make a loop fast, use a good compiler and tell it to
unroll the loop. This has many advantages: The compiler can make a
good decision how often to unroll the loop. It will know where using
more unrolling is just pointless. It knows exactly what is the best
way to divide things up between partial and complete iterations. Using
Duff's device, the compiler has to figure out what is happening and
will most likely produce suboptimal code.

For example, on a typical x86 system, with eight times unrolling,
there would be a loop looking like dst[0]=src[0];dst[1]=src[1];...;dst
+=8;src+=8; Only two operations incrementing pointers. Duff's device
has a label after each *dst++=*src++; so it needs eight separate
increments for src and dst. It would probably be possible to detect
Duff's device and transform it back to a straightforward loop, but
that would just be a waste of compiler developer's time.

Now consider what happens if you have an auto-vectorising compiler. A
straightforward loop could be transformed to vector operations and
become ten times faster. But seeing Duff's device, the compiler will
just shake his head in disgust and run away.

However, in this case you could have got ten times faster code by just
calling memcpy (). What are the advantages? I'll give you one example:
If you compiled code for an earlier version of MacOS X running on an
ancient G3 processor, memcpy () would execute hand-optimised assembler
code running at maximum speed for that processor. Take the same
executable (NOT recompiled) and run it on last years G5 processor, and
it executes completely different code, optimised for a G5 processor.
And now you take the same executable and run it on a Macintosh with an
Intel processor. While Duff's device originally generated awful code
for a PowerPC G3, which is now automatically translated to even worse
x86 code, the memcpy () call actually executes hand-optimised x86
assembler code!

Summary: You can use an error-prone, unreadable construct and feel
clever about it, while wasting your time and producing slow code, or
you can leave the hard work to the compiler, save lots of coding and
debugging time, and get code that executes a lot faster. Tough
choice.
 
C

christian.bau

Yes. Most compilers will do this for you simply by giving it the
above code. A compiler might choose to use repeated compare
operations, but it seems like any unlikely choice in this case. If
yours does, I'd look to change it rather than venture into
non-standard "computed goto" territory (not that I am saying any
compilers support it, but if they did, I would avoid it).

For small numbers of consecutive values, it is much easier to do say
four compares and eight conditional branches. Compare to 1, branch if
less, branch if equal, compare to 3, branch if less, branch if equal,
Compare to 5 etc. etc. n/4 compares and n/2 branches on average.
Computed branches usually are much worse for branch prediction
hardware. For large numbers, use a binary branch strategy with log(n)
compares and branches in every case.
 
E

Eric Sosman

christian.bau wrote On 10/10/07 16:21,:
The whole thing is completely stupid.

Yes to that part. However ...
[...] Duff's device
has a label after each *dst++=*src++; so it needs eight separate
increments for src and dst. [...]
[...] in this case you could have got ten times faster code by just
calling memcpy ().

.... The O.P.'s code could (and should) be replaced by a
call to memcpy(), but Duff's device could not be. See,
for example,

http://en.wikipedia.org/wiki/Duff's_device

.... and let's give no more guff to Duff!
 
A

anon.asdf

On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:

<snip>


The switch itself scales horribly. At worst we encounter 8 compare-
operations until we reach the desired case.

Can we get a label's address and subtract (or add) an integer to it?
Or are there other alternatives that don't need a label...?
Of course, there are issues involved:
What if the compiler thinks it's "clever" :) ?
what does the compiler do with the empty ; located at la_0 (of course
we do not want a nop)?
Thanks a thousand... I'm getting a cold duff from the fridge!
Kind regards,
Albert

You shouldn't assume that a switch is always implemented by a
sequence of comparisons; it might well be implemented as a jump table.

What you seek can be done with gcc, using the labels as values
extension, (you can store the labels in an array L say, and goto
L[user_output%8]) but that isn't (standard) C.

Hi!

Here's a code example using the non-standard,
but cool gcc extensions!

I cannot understand why this is not part of the C standard: it's
really powerful!!

{
//deliberately provocative statement
Ah who cares, gcc is the defacto standard anyway! // ;)
}

Regards,
Albert

/***************************************/

#include <stdio.h>

#define STEP 8
#define MAX_LEN STEP*4+1
#define SOURCE_LEN 28

int main(void)
{
char source[SOURCE_LEN] =
"abcdefghijklmnopqrstuvwxyz0";
char destination[MAX_LEN] = {0};

char *src_ptr;
char *dest_ptr;
char *lim;

void *jmp;

int user_input;

printf("Enter a number between incl. 1 and %d\n"
"To quit enter: 0<ENTER>\n", SOURCE_LEN-1);

while ((scanf("%d", &user_input) > 0)
&& (user_input > 0)
&& (user_input < SOURCE_LEN)) {

src_ptr = source;
lim = source + user_input;
dest_ptr = destination;

goto *(&&la_0 -
(user_input % 8) * (&&la_0 - && la_1));

do { *dest_ptr++ = *src_ptr++;
la_7: *dest_ptr++ = *src_ptr++;
la_6: *dest_ptr++ = *src_ptr++;
la_5: *dest_ptr++ = *src_ptr++;
la_4: *dest_ptr++ = *src_ptr++;
la_3: *dest_ptr++ = *src_ptr++;
la_2: *dest_ptr++ = *src_ptr++;
la_1: *dest_ptr++ = *src_ptr++;
la_0: ; } while (src_ptr < lim);

*dest_ptr = '\0';
puts(destination);
}
return 0;

}
 
B

Ben Bacarisse

On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:
You shouldn't assume that a switch is always implemented by a
sequence of comparisons; it might well be implemented as a jump table.

What you seek can be done with gcc, using the labels as values
extension, (you can store the labels in an array L say, and goto
L[user_output%8]) but that isn't (standard) C.

Here's a code example using the non-standard,
but cool gcc extensions!

The GCC people are indeed "cool" -- you should take their advice:

"The switch statement is cleaner, so use that rather than an array
[of labels] unless the problem does not fit a switch statement very
well."
I cannot understand why this is not part of the C standard: it's
really powerful!!

I can't understand why you would obfuscate your code like this. What
is wrong with memcpy?

goto *(&&la_0 -
(user_input % 8) * (&&la_0 - && la_1));

Way to go! You've taken a non-standard, non-portable, extension and
used it in such a way as to rely on assumptions that even gcc does not
guarantee. You must be aiming for minimum portability!
do { *dest_ptr++ = *src_ptr++;
la_7: *dest_ptr++ = *src_ptr++;
la_6: *dest_ptr++ = *src_ptr++;
la_5: *dest_ptr++ = *src_ptr++;
la_4: *dest_ptr++ = *src_ptr++;
la_3: *dest_ptr++ = *src_ptr++;
la_2: *dest_ptr++ = *src_ptr++;
la_1: *dest_ptr++ = *src_ptr++;
la_0: ; } while (src_ptr < lim);

As someone helpfully pointed out (in a reply to a post of mine) jump
tables are not even guaranteed to be the fastest was to implement a
switch.

Anyway, in this case you need memcpy.
 
L

lawrence.jones

Duncan Muirhead said:
You shouldn't assume that a switch is always implemented by a
sequence of comparisons; it might well be implemented as a jump table.

Indeed. Even Ritchie's original PDP-11 compiler from 30 years ago had
three or four different strategies (decision tree, jump table, hash
function, etc.) for implementing a switch depending on the number and
distribution of the case values.

-Larry Jones

I take it there's no qualifying exam to be a Dad. -- Calvin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,480
Members
44,900
Latest member
Nell636132

Latest Threads

Top