optimizing the switch statement in Duff's Device (casting a label, label abuse)

Discussion in 'C Programming' started by anon.asdf@gmail.com, Oct 10, 2007.

  1. Guest

    Here's a reminder of duff's device:

    /*************************************/
    #include <stdio.h>

    #define STEP 8
    #define MAX_LEN STEP*4+1
    #define SOURCE_LEN 28


    int main(void)
    {
    char source[SOURCE_LEN] =
    "abcdefghijklmnopqrstuvwxyz0";
    char destination[MAX_LEN] = {0};

    char *src_ptr;
    char *dest_ptr;
    char *lim;

    int user_input;

    printf("Enter a number between incl. 1 and %d\n"
    "To quit enter: 0<ENTER>\n", SOURCE_LEN-1);

    while ((scanf("%d", &user_input) > 0)
    && (user_input > 0)
    && (user_input < SOURCE_LEN)) {

    src_ptr = source;
    lim = source + user_input;
    dest_ptr = destination;

    switch (user_input % 8) {
    case 0: do {*dest_ptr++ = *src_ptr++;
    case 7: *dest_ptr++ = *src_ptr++;
    case 6: *dest_ptr++ = *src_ptr++;
    case 5: *dest_ptr++ = *src_ptr++;
    case 4: *dest_ptr++ = *src_ptr++;
    case 3: *dest_ptr++ = *src_ptr++;
    case 2: *dest_ptr++ = *src_ptr++;
    case 1: *dest_ptr++ = *src_ptr++;
    } while (src_ptr < lim);
    }

    #if 0 /*** alternative ***/
    switch (user_input % 8) {
    do {*dest_ptr++ = *src_ptr++;
    case 7: *dest_ptr++ = *src_ptr++;
    case 6: *dest_ptr++ = *src_ptr++;
    case 5: *dest_ptr++ = *src_ptr++;
    case 4: *dest_ptr++ = *src_ptr++;
    case 3: *dest_ptr++ = *src_ptr++;
    case 2: *dest_ptr++ = *src_ptr++;
    case 1: *dest_ptr++ = *src_ptr++;
    case 0: ; } while (src_ptr < lim);
    }
    #endif

    *dest_ptr = '\0';
    puts(destination);
    }
    return 0;
    }


    /*************************************/


    The switch itself scales horribly. At worst we encounter 8 compare-
    operations until we reach the desired case.

    Maby there is a better way of doing the switch statement: In
    particular - the argument to the switch evaluates to an integer

    between 0 and 7 (incl.) and now the question is: could we not use this
    number as an offset to immediately jump the code to

    the desired location.

    goto la_0 - (user_input % 8)
    // * sizeof("*dest_ptr++ = *src_ptr++;"instruction)
    ;


    do {*dest_ptr++ = *src_ptr++;
    la_7: *dest_ptr++ = *src_ptr++;
    la_6: *dest_ptr++ = *src_ptr++;
    la_5: *dest_ptr++ = *src_ptr++;
    la_4: *dest_ptr++ = *src_ptr++;
    la_3: *dest_ptr++ = *src_ptr++;
    la_2: *dest_ptr++ = *src_ptr++;
    la_1: *dest_ptr++ = *src_ptr++;
    la_0: ; } while (src_ptr < lim);


    The code snippet above shows the idea:
    we jump to an address relative to out pivot-label la_0.

    Example
    If (user_input % 8) == 1, we
    want to jump to "la_0 minus a few instructions" to land up at la_1.


    Can something like this be done?
    Can we get a label's address and subtract (or add) an integer to it?
    Or are there other alternatives that don't need a label...?

    Of course, there are issues involved:
    What if the compiler thinks it's "clever" :) ?
    what does the compiler do with the empty ; located at la_0 (of course
    we do not want a nop)?

    Thanks a thousand... I'm getting a cold duff from the fridge!

    Kind regards,
    Albert
     
    , Oct 10, 2007
    #1
    1. Advertising

  2. On Oct 10, 5:25 pm, wrote:
    > Here's a reminder of duff's device:
    >
    > /*************************************/
    > #include <stdio.h>
    >
    > #define STEP 8
    > #define MAX_LEN STEP*4+1
    > #define SOURCE_LEN 28
    >
    > int main(void)
    > {
    > char source[SOURCE_LEN] =
    > "abcdefghijklmnopqrstuvwxyz0";
    > char destination[MAX_LEN] = {0};
    >
    > char *src_ptr;
    > char *dest_ptr;
    > char *lim;
    >
    > int user_input;
    >
    > printf("Enter a number between incl. 1 and %d\n"
    > "To quit enter: 0<ENTER>\n", SOURCE_LEN-1);
    >
    > while ((scanf("%d", &user_input) > 0)
    > && (user_input > 0)
    > && (user_input < SOURCE_LEN)) {
    >
    > src_ptr = source;
    > lim = source + user_input;
    > dest_ptr = destination;
    >
    > switch (user_input % 8) {
    > case 0: do {*dest_ptr++ = *src_ptr++;
    > case 7: *dest_ptr++ = *src_ptr++;
    > case 6: *dest_ptr++ = *src_ptr++;
    > case 5: *dest_ptr++ = *src_ptr++;
    > case 4: *dest_ptr++ = *src_ptr++;
    > case 3: *dest_ptr++ = *src_ptr++;
    > case 2: *dest_ptr++ = *src_ptr++;
    > case 1: *dest_ptr++ = *src_ptr++;
    > } while (src_ptr < lim);
    > }
    >
    > #if 0 /*** alternative ***/
    > switch (user_input % 8) {
    > do {*dest_ptr++ = *src_ptr++;
    > case 7: *dest_ptr++ = *src_ptr++;
    > case 6: *dest_ptr++ = *src_ptr++;
    > case 5: *dest_ptr++ = *src_ptr++;
    > case 4: *dest_ptr++ = *src_ptr++;
    > case 3: *dest_ptr++ = *src_ptr++;
    > case 2: *dest_ptr++ = *src_ptr++;
    > case 1: *dest_ptr++ = *src_ptr++;
    > case 0: ; } while (src_ptr < lim);
    > }
    > #endif
    >
    > *dest_ptr = '\0';
    > puts(destination);
    > }
    > return 0;
    >
    > }
    >
    > /*************************************/
    >
    > The switch itself scales horribly. At worst we encounter 8 compare-
    > operations until we reach the desired case.


    Really? What compiler are you using as a matter of interest? It must
    be a pretty poor quality one.

    > Maby there is a better way of doing the switch statement: In
    > particular - the argument to the switch evaluates to an integer
    > between 0 and 7 (incl.) and now the question is: could we not use this
    > number as an offset to immediately jump the code to
    > the desired location.


    The compiler could obviously do it, that's what tools are for. I'm
    interested to know what compiler you've got which doesn't do what you
    say for the code given above.

    > goto la_0 - (user_input % 8)
    > // * sizeof("*dest_ptr++ = *src_ptr++;"instruction)
    > ;
    >
    > do {*dest_ptr++ = *src_ptr++;
    > la_7: *dest_ptr++ = *src_ptr++;
    > la_6: *dest_ptr++ = *src_ptr++;
    > la_5: *dest_ptr++ = *src_ptr++;
    > la_4: *dest_ptr++ = *src_ptr++;
    > la_3: *dest_ptr++ = *src_ptr++;
    > la_2: *dest_ptr++ = *src_ptr++;
    > la_1: *dest_ptr++ = *src_ptr++;
    > la_0: ; } while (src_ptr < lim);
    >
    > The code snippet above shows the idea:
    > we jump to an address relative to out pivot-label la_0.
    >
    > Example
    > If (user_input % 8) == 1, we
    > want to jump to "la_0 minus a few instructions" to land up at la_1.
    >
    > Can something like this be done?


    Not easily or efficiently in C.

    > Can we get a label's address and subtract (or add) an integer to it?
    > Or are there other alternatives that don't need a label...?


    Use Duff's device and a decent compiler.

    > Of course, there are issues involved:
    > What if the compiler thinks it's "clever" :) ?
    > what does the compiler do with the empty ; located at la_0 (of course
    > we do not want a nop)?
    >
    > Thanks a thousand... I'm getting a cold duff from the fridge!


    I'd track down a decent compiler which does the sensible thing with
    Duff's device before spending time trying to hack up some alternative.
    An old gcc on x86 does the obvious thing correctly, for example.
     
    J. J. Farrell, Oct 10, 2007
    #2
    1. Advertising

  3. On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:

    <snip>
    >
    > The switch itself scales horribly. At worst we encounter 8 compare-
    > operations until we reach the desired case.
    >

    <snip>
    >
    > Can we get a label's address and subtract (or add) an integer to it?
    > Or are there other alternatives that don't need a label...?
    >
    > Of course, there are issues involved:
    > What if the compiler thinks it's "clever" :) ?
    > what does the compiler do with the empty ; located at la_0 (of course
    > we do not want a nop)?
    >
    > Thanks a thousand... I'm getting a cold duff from the fridge!
    >
    > Kind regards,
    > Albert

    You shouldn't assume that a switch is always implemented by a
    sequence of comparisons; it might well be implemented as a jump table.

    What you seek can be done with gcc, using the labels as values
    extension, (you can store the labels in an array L say, and goto
    L[user_output%8]) but that isn't (standard) C.
     
    Duncan Muirhead, Oct 10, 2007
    #3
  4. writes:
    > Here's a reminder of duff's device:

    <snip>
    > switch (user_input % 8) {
    > case 0: do {*dest_ptr++ = *src_ptr++;
    > case 7: *dest_ptr++ = *src_ptr++;
    > case 6: *dest_ptr++ = *src_ptr++;
    > case 5: *dest_ptr++ = *src_ptr++;
    > case 4: *dest_ptr++ = *src_ptr++;
    > case 3: *dest_ptr++ = *src_ptr++;
    > case 2: *dest_ptr++ = *src_ptr++;
    > case 1: *dest_ptr++ = *src_ptr++;
    > } while (src_ptr < lim);
    > }

    <snip>
    > The switch itself scales horribly. At worst we encounter 8 compare-
    > operations until we reach the desired case.
    >
    > Maby there is a better way of doing the switch statement: In
    > particular - the argument to the switch evaluates to an integer
    >
    > between 0 and 7 (incl.) and now the question is: could we not use this
    > number as an offset to immediately jump the code to


    Yes. Most compilers will do this for you simply by giving it the
    above code. A compiler might choose to use repeated compare
    operations, but it seems like any unlikely choice in this case. If
    yours does, I'd look to change it rather than venture into
    non-standard "computed goto" territory (not that I am saying any
    compilers support it, but if they did, I would avoid it).

    --
    Ben.
     
    Ben Bacarisse, Oct 10, 2007
    #4
  5. Guest

    On Oct 10, 7:12 pm, Duncan Muirhead <> wrote:
    > On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:
    >
    > <snip>
    >
    >
    >
    > > The switch itself scales horribly. At worst we encounter 8 compare-
    > > operations until we reach the desired case.

    >
    > <snip>
    >
    > > Can we get a label's address and subtract (or add) an integer to it?
    > > Or are there other alternatives that don't need a label...?

    >
    > > Of course, there are issues involved:
    > > What if the compiler thinks it's "clever" :) ?
    > > what does the compiler do with the empty ; located at la_0 (of course
    > > we do not want a nop)?

    >
    > > Thanks a thousand... I'm getting a cold duff from the fridge!

    >
    > > Kind regards,
    > > Albert

    >
    > You shouldn't assume that a switch is always implemented by a
    > sequence of comparisons; it might well be implemented as a jump table.
    >



    Ah great! Thanks for the info.
    (I was mislead by reading http://www.geekronomicon.com/?q=node/68#switch).


    > What you seek can be done with gcc, using the labels as values
    > extension, (you can store the labels in an array L say, and goto
    > L[user_output%8]) but that isn't (standard) C.


    I'm using gcc, so I'll keep that in mind in case I need to do
    something more exotic than a switch, one day.


    Kind regards,
    Albert
     
    , Oct 10, 2007
    #5
  6. The whole thing is completely stupid.

    1. If you want to make a loop fast, use a good compiler and tell it to
    unroll the loop. This has many advantages: The compiler can make a
    good decision how often to unroll the loop. It will know where using
    more unrolling is just pointless. It knows exactly what is the best
    way to divide things up between partial and complete iterations. Using
    Duff's device, the compiler has to figure out what is happening and
    will most likely produce suboptimal code.

    For example, on a typical x86 system, with eight times unrolling,
    there would be a loop looking like dst[0]=src[0];dst[1]=src[1];...;dst
    +=8;src+=8; Only two operations incrementing pointers. Duff's device
    has a label after each *dst++=*src++; so it needs eight separate
    increments for src and dst. It would probably be possible to detect
    Duff's device and transform it back to a straightforward loop, but
    that would just be a waste of compiler developer's time.

    Now consider what happens if you have an auto-vectorising compiler. A
    straightforward loop could be transformed to vector operations and
    become ten times faster. But seeing Duff's device, the compiler will
    just shake his head in disgust and run away.

    However, in this case you could have got ten times faster code by just
    calling memcpy (). What are the advantages? I'll give you one example:
    If you compiled code for an earlier version of MacOS X running on an
    ancient G3 processor, memcpy () would execute hand-optimised assembler
    code running at maximum speed for that processor. Take the same
    executable (NOT recompiled) and run it on last years G5 processor, and
    it executes completely different code, optimised for a G5 processor.
    And now you take the same executable and run it on a Macintosh with an
    Intel processor. While Duff's device originally generated awful code
    for a PowerPC G3, which is now automatically translated to even worse
    x86 code, the memcpy () call actually executes hand-optimised x86
    assembler code!

    Summary: You can use an error-prone, unreadable construct and feel
    clever about it, while wasting your time and producing slow code, or
    you can leave the hard work to the compiler, save lots of coding and
    debugging time, and get code that executes a lot faster. Tough
    choice.
     
    christian.bau, Oct 10, 2007
    #6
  7. On Oct 10, 6:14 pm, Ben Bacarisse <> wrote:

    > Yes. Most compilers will do this for you simply by giving it the
    > above code. A compiler might choose to use repeated compare
    > operations, but it seems like any unlikely choice in this case. If
    > yours does, I'd look to change it rather than venture into
    > non-standard "computed goto" territory (not that I am saying any
    > compilers support it, but if they did, I would avoid it).


    For small numbers of consecutive values, it is much easier to do say
    four compares and eight conditional branches. Compare to 1, branch if
    less, branch if equal, compare to 3, branch if less, branch if equal,
    Compare to 5 etc. etc. n/4 compares and n/2 branches on average.
    Computed branches usually are much worse for branch prediction
    hardware. For large numbers, use a binary branch strategy with log(n)
    compares and branches in every case.
     
    christian.bau, Oct 10, 2007
    #7
  8. Eric Sosman Guest

    Re: optimizing the switch statement in Duff's Device (casting a label,label abuse)

    christian.bau wrote On 10/10/07 16:21,:
    > The whole thing is completely stupid.


    Yes to that part. However ...

    > [...] Duff's device
    > has a label after each *dst++=*src++; so it needs eight separate
    > increments for src and dst. [...]
    > [...] in this case you could have got ten times faster code by just
    > calling memcpy ().


    .... The O.P.'s code could (and should) be replaced by a
    call to memcpy(), but Duff's device could not be. See,
    for example,

    http://en.wikipedia.org/wiki/Duff's_device

    .... and let's give no more guff to Duff!

    --
     
    Eric Sosman, Oct 10, 2007
    #8
  9. Guest

    On Oct 10, 7:12 pm, Duncan Muirhead <> wrote:
    > On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:
    >
    > <snip>
    >
    >
    >
    > > The switch itself scales horribly. At worst we encounter 8 compare-
    > > operations until we reach the desired case.

    >
    > <snip>
    >
    > > Can we get a label's address and subtract (or add) an integer to it?
    > > Or are there other alternatives that don't need a label...?

    >
    > > Of course, there are issues involved:
    > > What if the compiler thinks it's "clever" :) ?
    > > what does the compiler do with the empty ; located at la_0 (of course
    > > we do not want a nop)?

    >
    > > Thanks a thousand... I'm getting a cold duff from the fridge!

    >
    > > Kind regards,
    > > Albert

    >
    > You shouldn't assume that a switch is always implemented by a
    > sequence of comparisons; it might well be implemented as a jump table.
    >
    > What you seek can be done with gcc, using the labels as values
    > extension, (you can store the labels in an array L say, and goto
    > L[user_output%8]) but that isn't (standard) C.


    Hi!

    Here's a code example using the non-standard,
    but cool gcc extensions!

    I cannot understand why this is not part of the C standard: it's
    really powerful!!

    {
    //deliberately provocative statement
    Ah who cares, gcc is the defacto standard anyway! // ;)
    }

    Regards,
    Albert

    /***************************************/

    #include <stdio.h>

    #define STEP 8
    #define MAX_LEN STEP*4+1
    #define SOURCE_LEN 28

    int main(void)
    {
    char source[SOURCE_LEN] =
    "abcdefghijklmnopqrstuvwxyz0";
    char destination[MAX_LEN] = {0};

    char *src_ptr;
    char *dest_ptr;
    char *lim;

    void *jmp;

    int user_input;

    printf("Enter a number between incl. 1 and %d\n"
    "To quit enter: 0<ENTER>\n", SOURCE_LEN-1);

    while ((scanf("%d", &user_input) > 0)
    && (user_input > 0)
    && (user_input < SOURCE_LEN)) {

    src_ptr = source;
    lim = source + user_input;
    dest_ptr = destination;

    goto *(&&la_0 -
    (user_input % 8) * (&&la_0 - && la_1));

    do { *dest_ptr++ = *src_ptr++;
    la_7: *dest_ptr++ = *src_ptr++;
    la_6: *dest_ptr++ = *src_ptr++;
    la_5: *dest_ptr++ = *src_ptr++;
    la_4: *dest_ptr++ = *src_ptr++;
    la_3: *dest_ptr++ = *src_ptr++;
    la_2: *dest_ptr++ = *src_ptr++;
    la_1: *dest_ptr++ = *src_ptr++;
    la_0: ; } while (src_ptr < lim);

    *dest_ptr = '\0';
    puts(destination);
    }
    return 0;

    }
     
    , Oct 11, 2007
    #9
  10. writes:

    > On Oct 10, 7:12 pm, Duncan Muirhead <> wrote:
    >> On Wed, 10 Oct 2007 09:25:41 -0700, anon.asdf wrote:
    >>
    >> <snip>
    >> > The switch itself scales horribly. At worst we encounter 8 compare-
    >> > operations until we reach the desired case.

    >>
    >> <snip>
    >> > Can we get a label's address and subtract (or add) an integer to it?
    >> > Or are there other alternatives that don't need a label...?

    >>

    <snip>
    >> You shouldn't assume that a switch is always implemented by a
    >> sequence of comparisons; it might well be implemented as a jump table.
    >>
    >> What you seek can be done with gcc, using the labels as values
    >> extension, (you can store the labels in an array L say, and goto
    >> L[user_output%8]) but that isn't (standard) C.

    >
    > Here's a code example using the non-standard,
    > but cool gcc extensions!


    The GCC people are indeed "cool" -- you should take their advice:

    "The switch statement is cleaner, so use that rather than an array
    [of labels] unless the problem does not fit a switch statement very
    well."

    > I cannot understand why this is not part of the C standard: it's
    > really powerful!!


    I can't understand why you would obfuscate your code like this. What
    is wrong with memcpy?

    <snip>
    > goto *(&&la_0 -
    > (user_input % 8) * (&&la_0 - && la_1));


    Way to go! You've taken a non-standard, non-portable, extension and
    used it in such a way as to rely on assumptions that even gcc does not
    guarantee. You must be aiming for minimum portability!

    > do { *dest_ptr++ = *src_ptr++;
    > la_7: *dest_ptr++ = *src_ptr++;
    > la_6: *dest_ptr++ = *src_ptr++;
    > la_5: *dest_ptr++ = *src_ptr++;
    > la_4: *dest_ptr++ = *src_ptr++;
    > la_3: *dest_ptr++ = *src_ptr++;
    > la_2: *dest_ptr++ = *src_ptr++;
    > la_1: *dest_ptr++ = *src_ptr++;
    > la_0: ; } while (src_ptr < lim);


    As someone helpfully pointed out (in a reply to a post of mine) jump
    tables are not even guaranteed to be the fastest was to implement a
    switch.

    Anyway, in this case you need memcpy.

    --
    Ben.
     
    Ben Bacarisse, Oct 11, 2007
    #10
  11. Guest

    Duncan Muirhead <> wrote:
    >
    > You shouldn't assume that a switch is always implemented by a
    > sequence of comparisons; it might well be implemented as a jump table.


    Indeed. Even Ritchie's original PDP-11 compiler from 30 years ago had
    three or four different strategies (decision tree, jump table, hash
    function, etc.) for implementing a switch depending on the number and
    distribution of the case values.

    -Larry Jones

    I take it there's no qualifying exam to be a Dad. -- Calvin
     
    , Oct 11, 2007
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jamie Risk

    Duff Device reference ...

    Jamie Risk, Oct 16, 2004, in forum: C Programming
    Replies:
    1
    Views:
    436
    Michael Wojcik
    Oct 20, 2004
  2. Christopher Benson-Manica

    A small question about Duff's Device

    Christopher Benson-Manica, Oct 21, 2004, in forum: C Programming
    Replies:
    2
    Views:
    321
    Michael Wojcik
    Oct 21, 2004
  3. Jan Richter

    duff's device / loop unriolling

    Jan Richter, Aug 19, 2005, in forum: C Programming
    Replies:
    19
    Views:
    703
    Tim Rentsch
    Aug 29, 2005
  4. Hallvard B Furuseth

    Duff's Device

    Hallvard B Furuseth, Sep 28, 2006, in forum: C Programming
    Replies:
    11
    Views:
    654
    Hallvard B Furuseth
    Oct 3, 2006
  5. yawnmoth

    how does duff's device work?

    yawnmoth, Nov 9, 2008, in forum: C Programming
    Replies:
    7
    Views:
    338
    Phil Carmody
    Nov 9, 2008
Loading...

Share This Page