duff's device / loop unriolling

Discussion in 'C Programming' started by Jan Richter, Aug 19, 2005.

  1. Jan Richter

    Jan Richter Guest

    Hi there,

    the Code below shows DJBs own implementation of strlen (str_len):

    unsigned int str_len(char *s)
    {
    register char *t;
    t = s;
    for (;;) {
    if (!*t) return t - s; ++t;
    if (!*t) return t - s; ++t;
    if (!*t) return t - s; ++t;
    if (!*t) return t - s; ++t;
    }
    }
    I understand the code so far, but I have a question about the loops.
    To me, it seems he used loop unrolling (aka duff's device) to optimize
    the code while it is executed. But why did he use four loops? When
    the function is invoked, he didn't know how big "s" is. Or am I wrong here?
    I always thought, to unroll a loop I need to know how often the loop is
    used.

    Any help would be appreciated,
    JR
     
    Jan Richter, Aug 19, 2005
    #1
    1. Advertising

  2. Jan Richter wrote:
    > Hi there,
    >
    > the Code below shows DJBs own implementation of strlen (str_len):
    >
    > unsigned int str_len(char *s)
    > {
    > register char *t;
    > t = s;
    > for (;;) {
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > }
    > }
    > I understand the code so far, but I have a question about the loops.
    > To me, it seems he used loop unrolling (aka duff's device) to optimize
    > the code while it is executed. But why did he use four loops? When
    > the function is invoked, he didn't know how big "s" is. Or am I wrong here?
    > I always thought, to unroll a loop I need to know how often the loop is
    > used.
    >
    > Any help would be appreciated,
    > JR
    >
    >


    In my linbox, strlen() gives by far better results...

    --
    one's freedom stops where others' begin

    Giannis Papadopoulos
    http://dop.users.uth.gr/
    University of Thessaly
    Computer & Communications Engineering dept.
     
    Giannis Papadopoulos, Aug 19, 2005
    #2
    1. Advertising

  3. Jan Richter

    Jan Richter Guest

    "Giannis Papadopoulos" wrote:

    [...]

    >
    > In my linbox, strlen() gives by far better results...
    >


    Yes, same for me, so where is the benefit?

    Cheers,
    JR
     
    Jan Richter, Aug 19, 2005
    #3
  4. Jan Richter wrote:
    > "Giannis Papadopoulos" wrote:
    >
    > [...]
    >
    >
    >>In my linbox, strlen() gives by far better results...
    >>

    >
    >
    > Yes, same for me, so where is the benefit?
    >
    > Cheers,
    > JR
    >
    >


    No benefit... Maybe it is written for a compiler that does not know how
    to unroll loops...

    And yes, loop unrolling works perfectly when you have a hint how many
    times this loop will be executed.

    Doesn't the author of this peculiar str_len() say anything?

    --
    one's freedom stops where others' begin

    Giannis Papadopoulos
    http://dop.users.uth.gr/
    University of Thessaly
    Computer & Communications Engineering dept.
     
    Giannis Papadopoulos, Aug 19, 2005
    #4
  5. Jan Richter

    Guest

    Jan Richter wrote:
    >But why did he use four loops? When
    >the function is invoked, he didn't know how big "s" is.


    may be based on the consideration of pipeline.
    I heard Intel PII CPU has four levels pipelines of integer operation.
    is that right?
     
    , Aug 19, 2005
    #5
  6. Jan Richter

    Richard Bos Guest

    "Jan Richter" <> wrote:

    > "Giannis Papadopoulos" wrote:
    >
    > > In my linbox, strlen() gives by far better results...

    >
    > Yes, same for me, so where is the benefit?


    The original Duff's Device did a bit more than merely count. It's in the
    FAQ, btw, which you should've read.
    <http://www.eskimo.com/~scs/C-faq/top.html>.

    Richard
     
    Richard Bos, Aug 19, 2005
    #6
  7. Jan Richter

    Netocrat Guest

    On Fri, 19 Aug 2005 11:53:21 +0200, Jan Richter wrote:

    > Hi there,
    >
    > the Code below shows DJBs own implementation of strlen (str_len):
    >
    > unsigned int str_len(char *s)
    > {
    > register char *t;
    > t = s;
    > for (;;) {
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > }
    > }
    > I understand the code so far, but I have a question about the loops.
    > To me, it seems he used loop unrolling (aka duff's device) to optimize
    > the code while it is executed.


    As someone else has said most optimising compilers will do this for you.
    It's a far more appropriate technique for assembly. But perhaps the
    author had a specific implementation in mind for which this happened to
    work best. Also as pointed out elsewhere Duff's Device is a more specific
    term than loop unrolling.

    > But why did he use four loops? When the function is invoked, he didn't
    > know how big "s" is. Or am I wrong here? I always thought, to unroll a
    > loop I need to know how often the loop is used.


    There are two main considerations: speed vs code size.

    More unrolling == less jump opcodes == more loop body code.

    Which is equivalent to a speed increase at the cost of a code size
    increase.

    If you know how often the loop body will be iterated on average then you
    can avoid unrolling the loop by more than that number. Hence you avoid
    the extra code without affecting the speed increase.

    If you don't know, then you make a decision as to how much extra code
    you're willing to tolerate. There are other issues like keeping all the
    loop body code in the instruction cache and possibly others that I'm not
    too familiar with not being an assembly coder.

    For the strlen function as you point out you can't know what the average
    number of iterations will be so the only consideration is code size.
    Providing you can keep it all in the instruction cache the loop could be
    unrolled indefinitely and continue to provide speed increases for calls
    with a sufficiently long string.

    Why did the author choose four? Who knows - perhaps he wanted the
    function to be reasonably small whilst still providing a modest speed
    increase. Which as you and others point out, it doesn't. Going to show
    once again that usually optimisations are best left to the compiler or
    assembly.

    --
    http://members.dodo.com.au/~netocrat
     
    Netocrat, Aug 19, 2005
    #7
  8. Jan Richter

    pete Guest

    Jan Richter wrote:
    >
    > Hi there,
    >
    > the Code below shows DJBs own implementation of strlen (str_len):
    >
    > unsigned int str_len(char *s)
    > {
    > register char *t;
    > t = s;
    > for (;;) {
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > }
    > }


    Why only four if statements in the loop?

    I guess that's not supposed to be an example of portable code.
    The (t-s) expressions are of type ptrdiff_t,
    which is unsuitable for use that way, in portable code.

    That's not Duff's Device.
    http://www.lysator.liu.se/c/duffs-device.html

    --
    pete
     
    pete, Aug 19, 2005
    #8
  9. Giannis Papadopoulos <> wrote:

    > No benefit... Maybe it is written for a compiler that does not know how
    > to unroll loops...


    Probably; an implementation simple-minded enough to trust the
    programmer when he uses the "register" storage class specifier
    probably could use some help unrolling a loop. The VAX compiler for
    which Duff originally wrote his device was (presumably) such an
    implementation.

    --
    Christopher Benson-Manica | I *should* know what I'm talking about - if I
    ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
     
    Christopher Benson-Manica, Aug 19, 2005
    #9
  10. "Jan Richter" <> writes:
    > the Code below shows DJBs own implementation of strlen (str_len):
    >
    > unsigned int str_len(char *s)
    > {
    > register char *t;
    > t = s;
    > for (;;) {
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > }
    > }
    > I understand the code so far, but I have a question about the loops.
    > To me, it seems he used loop unrolling (aka duff's device) to optimize
    > the code while it is executed. But why did he use four loops? When
    > the function is invoked, he didn't know how big "s" is. Or am I wrong here?
    > I always thought, to unroll a loop I need to know how often the loop is
    > used.


    Duff's device is a specific technique, not a synonym for loop
    unrolling. See question 20.35 of the C FAQ.

    The use of unsigned int is questionable. The return type should be
    size_t. It's possible that this implementation predates the existence
    of size_t (though it's been modernized with a prototype). I would
    guess that it might be more efficient than strlen() only on a very old
    implementation.

    A matter of terminology: it doesn't have four loops. It has a single
    loop containing four if statements.

    The statement that's executed N times is

    if (!*t) return t - s; ++t;

    N needn't be a multiple of 4 because it can break out of the loop
    partway through the body of the for loop.

    In the following, I'll use the term "iteration" for an execution of
    the entire body of the loop, and "sub-iteration" for an execution of a
    single line within the loop.

    The idea of loop unrolling is that it avoids the overhead of a test on
    each iteration, falling through from one statement to the next and
    performing the test and branch only once every 4 elements. The number
    of times a loop is to be unrolled is a tradeoff between speed and code
    size -- and if it's unrolled too much (say, 1024 times), the code size
    itself can make it run slower due to cache issues. This is all
    *extremely* system-specific, which is why the whole thing is best left
    to the compiler.

    Furthermore, the "beauty" of Duff's Device is that it jumps into the
    middle of the loop, so the loop never has to exit from the middle, and
    it really does avoid the overhead on each sub-iteration. The code
    above doesn't do this; instead, it performs a test and branch on each
    sub-iteration.

    I'd be surprised to see the above code performing better than strlen()
    on any modern implementation, particularly since an implementation is
    free to implement strlen() using whatever non-portable tricks it likes
    to squeeze out the last clock cycle.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Aug 19, 2005
    #10
  11. In article <>,
    "Jan Richter" <> wrote:

    > Hi there,
    >
    > the Code below shows DJBs own implementation of strlen (str_len):
    >
    > unsigned int str_len(char *s)
    > {
    > register char *t;
    > t = s;
    > for (;;) {
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > if (!*t) return t - s; ++t;
    > }
    > }


    > I understand the code so far, but I have a question about the loops.
    > To me, it seems he used loop unrolling (aka duff's device) to optimize
    > the code while it is executed.


    Loop unrolling != Duff's device.

    Duff's device is a combination of loop unrolling + completely perverted
    usage of a switch statement, which is likely to prevent the compiler
    from optimising the code

    > But why did he use four loops? When
    > the function is invoked, he didn't know how big "s" is. Or am I wrong here?
    > I always thought, to unroll a loop I need to know how often the loop is
    > used.


    As you can see, knowing how often a loop will be executed is not
    necessary. However, before you unroll a loop by hand, you should check
    whether the compiler can do loop unrolling itself, which is likely to
    produce more efficient code.
     
    Christian Bau, Aug 19, 2005
    #11
  12. Jan Richter

    Netocrat Guest

    pete <> wrote:
    > Keith Thompson wrote:
    >> "Jan Richter" <> writes:
    >> > the Code below shows DJBs own implementation of strlen (str_len):
    >> >
    >> > unsigned int str_len(char *s)
    >> > {
    >> > register char *t;
    >> > t = s;
    >> > for (;;) {
    >> > if (!*t) return t - s; ++t;
    >> > if (!*t) return t - s; ++t;
    >> > if (!*t) return t - s; ++t;
    >> > if (!*t) return t - s; ++t;
    >> > }
    >> > }

    >
    >> The idea of loop unrolling is that it avoids the overhead of a test on
    >> each iteration,

    >
    > That's supposed to be the idea,
    > but the above code has one test per increment
    > just like a naive portable implementation.
    >
    > size_t str_len(const char *s)
    > {
    > size_t n = 0;
    >
    > while (s[n] != '\0') {
    > ++n;
    > }
    > return n;
    > }


    IANAAP (I am not an assembly programmer) so I don't know how the code
    typically translates, but it seems to me that it does provide some benefit
    - it reduces by 75%[1] the number of jump statements executed (to return
    from the end to the start of the loop).

    [1] When the number of iterations is not a multiple of 4 this is only
    approximate.

    --
    http://members.dodo.com.au/~netocrat
     
    Netocrat, Aug 19, 2005
    #12
  13. Jan Richter

    pete Guest

    Keith Thompson wrote:
    >
    > "Jan Richter" <> writes:
    > > the Code below shows DJBs own implementation of strlen (str_len):
    > >
    > > unsigned int str_len(char *s)
    > > {
    > > register char *t;
    > > t = s;
    > > for (;;) {
    > > if (!*t) return t - s; ++t;
    > > if (!*t) return t - s; ++t;
    > > if (!*t) return t - s; ++t;
    > > if (!*t) return t - s; ++t;
    > > }
    > > }


    > The idea of loop unrolling is that it avoids the overhead of a test on
    > each iteration,


    That's supposed to be the idea,
    but the above code has one test per increment
    just like a naive portable implementation.

    size_t str_len(const char *s)
    {
    size_t n = 0;

    while (s[n] != '\0') {
    ++n;
    }
    return n;
    }


    --
    pete
     
    pete, Aug 19, 2005
    #13
  14. Jan Richter

    Randy Howard Guest

    Christian Bau wrote
    (in article
    <>):

    > In article <>,
    > "Jan Richter" <> wrote:
    >
    >> Hi there,
    >>
    >> the Code below shows DJBs own implementation of strlen (str_len):
    >>
    >> unsigned int str_len(char *s)
    >> {
    >> register char *t;
    >> t = s;
    >> for (;;) {
    >> if (!*t) return t - s; ++t;
    >> if (!*t) return t - s; ++t;
    >> if (!*t) return t - s; ++t;
    >> if (!*t) return t - s; ++t;
    >> }
    >> }

    >
    >> I understand the code so far, but I have a question about the loops.
    >> To me, it seems he used loop unrolling (aka duff's device) to optimize
    >> the code while it is executed.

    >
    > Loop unrolling != Duff's device.
    >
    > Duff's device is a combination of loop unrolling + completely perverted
    > usage of a switch statement, which is likely to prevent the compiler
    > from optimising the code


    It's suddenly being discussed again, probably as a result of an
    article in DDJ (Doctor Dobb's Journal) about it in the August
    issue. Ralf Holly proposed using it in macro form for generic
    loop unrolling, which probably makes people misunderstand its
    original purpose.

    He proposed something like

    #define DUFF_DEVICE_8(macroCount, macroAction) \
    do { \
    size_t duffcount = (macroCount); \
    size_t dufftimes = (duffcount + 7) >>3u; \
    switch(duffcount & 7) { \
    case 0: do { macroAction; \
    case 7: macroAction; \
    case 6: macroAction; \
    case 5: macroAction; \
    case 4: macroAction; \
    case 3: macroAction; \
    case 2: macroAction; \
    case 1: macroAction; \
    } while (--dufftimes > 0); \
    } \
    } while (0)

    Of course, the caller has to know not to call with a 0
    countvalue or it will execute 8 times instead.

    Probably not all that practical in general use, but you might
    find somewhere, (after profiling) where it makes some sense.

    --
    Randy Howard (2reply remove FOOBAR)
     
    Randy Howard, Aug 19, 2005
    #14
  15. In article <>, Randy Howard <> writes:
    >
    > It's suddenly being discussed again, probably as a result of an
    > article in DDJ (Doctor Dobb's Journal) about it in the August
    > issue. Ralf Holly proposed using it in macro form for generic
    > loop unrolling, which probably makes people misunderstand its
    > original purpose.
    >
    > He proposed something like


    [Tabs converted to spaces to avoid wrapping.]

    > #define DUFF_DEVICE_8(macroCount, macroAction) \
    > do { \
    > size_t duffcount = (macroCount); \
    > size_t dufftimes = (duffcount + 7) >>3u; \
    > switch(duffcount & 7) { \
    > case 0: do { macroAction; \
    > case 7: macroAction; \
    > case 6: macroAction; \
    > case 5: macroAction; \
    > case 4: macroAction; \
    > case 3: macroAction; \
    > case 2: macroAction; \
    > case 1: macroAction; \
    > } while (--dufftimes > 0); \
    > } \
    > } while (0)
    >
    > Of course, the caller has to know not to call with a 0
    > countvalue or it will execute 8 times instead.


    Unless I'm missing something, that's easily fixed with a

    if (!duffcount) break;

    before the switch. (Testing duffcount avoids using macroCount,
    which might have side effects, twice, of course.)

    A worse problem would be using it with a negative count; hopefully
    the compiler would provide a useful diagnostic for the conversion
    between a signed value and a size_t when duffcount is initialized,
    but that's a QoI issue. (Also, I know far too many C programmers
    who routinely ignore such diagnostics, partly because their code is
    full of them. I suppose that's a QoP issue.)

    > Probably not all that practical in general use, but you might
    > find somewhere, (after profiling) where it makes some sense.


    Agreed. I have found cases where relatively recent commercial
    implementations don't unroll even simple loops where unrolling has a
    significant benefit. Of course, if the loop isn't in a performance-
    critical path, it doesn't matter anyway.

    --
    Michael Wojcik

    We are subdued to what we work in. (E M Forster)
     
    Michael Wojcik, Aug 20, 2005
    #15
  16. Jan Richter

    Randy Howard Guest

    Michael Wojcik wrote
    (in article <>):

    >
    > In article <>, Randy Howard
    > <> writes:
    >>
    >> It's suddenly being discussed again, probably as a result of an
    >> article in DDJ (Doctor Dobb's Journal) about it in the August
    >> issue. Ralf Holly proposed using it in macro form for generic
    >> loop unrolling, which probably makes people misunderstand its
    >> original purpose.
    >>
    >> He proposed something like

    >
    > [Tabs converted to spaces to avoid wrapping.]


    Ack, sorry about that. I recently changed newsreaders, and
    didn't pick up on it not doing that magically in the new
    version. I'll be more careful in the future.

    --
    Randy Howard (2reply remove FOOBAR)
     
    Randy Howard, Aug 23, 2005
    #16
  17. Jan Richter

    Tim Rentsch Guest

    (Michael Wojcik) writes:

    > In article <>, Randy Howard <> writes:
    > >
    > > It's suddenly being discussed again, probably as a result of an
    > > article in DDJ (Doctor Dobb's Journal) about it in the August
    > > issue. Ralf Holly proposed using it in macro form for generic
    > > loop unrolling, which probably makes people misunderstand its
    > > original purpose.
    > >
    > > He proposed something like

    >
    > [Tabs converted to spaces to avoid wrapping.]
    >
    > > #define DUFF_DEVICE_8(macroCount, macroAction) \
    > > do { \
    > > size_t duffcount = (macroCount); \
    > > size_t dufftimes = (duffcount + 7) >>3u; \
    > > switch(duffcount & 7) { \
    > > case 0: do { macroAction; \
    > > case 7: macroAction; \
    > > case 6: macroAction; \
    > > case 5: macroAction; \
    > > case 4: macroAction; \
    > > case 3: macroAction; \
    > > case 2: macroAction; \
    > > case 1: macroAction; \
    > > } while (--dufftimes > 0); \
    > > } \
    > > } while (0)
    > >
    > > Of course, the caller has to know not to call with a 0
    > > countvalue or it will execute 8 times instead.

    >
    > Unless I'm missing something, that's easily fixed with a
    >
    > if (!duffcount) break;
    >
    > before the switch. (Testing duffcount avoids using macroCount,
    > which might have side effects, twice, of course.)



    Alternatively:

    #define DUFF_DEVICE_8(macroCount, macroAction) \
    do { \
    size_t duffcount = (macroCount); \
    size_t dufftimes = duffcount >>3u; \
    switch(duffcount & 7) { \
    do { macroAction; \
    case 7: macroAction; \
    case 6: macroAction; \
    case 5: macroAction; \
    case 4: macroAction; \
    case 3: macroAction; \
    case 2: macroAction; \
    case 1: macroAction; \
    case 0: ; \
    } while (dufftimes-- > 0); \
    } \
    } while (0)

    handles the case with zero iterations without needing an
    extra 'if' before the switch, and simplifies the calculation
    of 'dufftimes'.
     
    Tim Rentsch, Aug 26, 2005
    #17
  18. Jan Richter

    SM Ryan Guest

    # #define DUFF_DEVICE_8(macroCount, macroAction) \

    With a modern, optimising compiler, it's bad idea. Compilers
    can do unrolling for you. Duff's Device is going to make the
    code so complicated that it will prevent the compiler from
    doing a number of additional optimisations. If you insist
    on unrolling your loops, do something like

    for (i=0; i+8<n; i+=8) {
    body(i+0);
    body(i+1);
    body(i+2);
    body(i+3);
    body(i+4);
    body(i+5);
    body(i+6);
    body(i+7);
    }
    switch (n-i) {
    case 7: body(i); i++;
    case 6: body(i); i++;
    case 5: body(i); i++;
    case 4: body(i); i++;
    case 3: body(i); i++;
    case 2: body(i); i++;
    case 1: body(i); i++;
    }

    This leaves the aggregate loop body in an optimisable form.

    --
    SM Ryan http://www.rawbw.com/~wyrmwif/
    No pleasure, no rapture, no exquisite sin greater than central air.
     
    SM Ryan, Aug 26, 2005
    #18
  19. Jan Richter

    Randy Howard Guest

    SM Ryan wrote
    (in article <>):

    > # #define DUFF_DEVICE_8(macroCount, macroAction) \
    >
    > With a modern, optimising compiler, it's bad idea. Compilers
    > can do unrolling for you.


    If you read the article in DDJ, you will see that your
    assumption, although likely broadly true is not /always/ true.
    As usual, it depends.

    > Duff's Device is going to make the
    > code so complicated that it will prevent the compiler from
    > doing a number of additional optimisations.


    That is why profilers are useful. To find out, for each and
    every case, which happens to apply.


    --
    Randy Howard (2reply remove FOOBAR)
     
    Randy Howard, Aug 26, 2005
    #19
  20. Jan Richter

    Tim Rentsch Guest

    SM Ryan <> writes:

    > # #define DUFF_DEVICE_8(macroCount, macroAction) \
    >
    > With a modern, optimising compiler, it's bad idea.


    As with most widely known programming techniques, whether
    it makes sense to employ Duff's Device in some particular
    circumstances depends on the circumstances.


    > Compilers can do unrolling for you.


    The question, however, usually is not whether some
    hypothetical compiler *can* but whether some actual compiler
    *does*. These questions often have different answers.


    > Duff's Device is going to make the
    > code so complicated that it will prevent the compiler from
    > doing a number of additional optimisations.


    Use of Duff's Device may complicate the code to the point
    where it *might* prevent the compiler from doing additional
    useful optimizations, but there's no guarantee of that. At
    this point Duff's Device is well known enough so advanced
    compilers using structural analysis to do their flow
    analysis may very well recognize the particular form of
    multiple-entry loop that Duff's Device uses and deal with it
    appropriately.

    More significantly, the loop body that is being unrolled is
    usually very simple (otherwise why are we bothering to
    unroll it?); often times it won't be improved beyond what
    is already done in the multiple-entry loop form.


    > If you insist
    > on unrolling your loops, do something like
    >
    > for (i=0; i+8<n; i+=8) {
    > body(i+0);
    > body(i+1);
    > body(i+2);
    > body(i+3);
    > body(i+4);
    > body(i+5);
    > body(i+6);
    > body(i+7);
    > }
    > switch (n-i) {
    > case 7: body(i); i++;
    > case 6: body(i); i++;
    > case 5: body(i); i++;
    > case 4: body(i); i++;
    > case 3: body(i); i++;
    > case 2: body(i); i++;
    > case 1: body(i); i++;
    > }
    >
    > This leaves the aggregate loop body in an optimisable form.


    Minor correction - the code shown is wrong: if n is a
    multiple of eight (and greater than zero), n-8 cases are
    done rather than n cases. (There are at least two easy
    fixes, left as exercises for the reader.)

    As noted before, often times the compiler won't be able
    to optimize the code in the 'for' beyond what it would
    do in the corresponding expression in Duff's Device.

    The technique of incrementing i by 8, and using 'i+0', etc,
    in the unrolled loop body, is a good one to know; however,
    this technique can also be used with Duff's Device:

    switch( i = n % 8 ) do {
    i += 8;
    body(i-8);
    case 7: body(i-7);
    case 6: body(i-6);
    case 5: body(i-5);
    case 4: body(i-4);
    case 3: body(i-3);
    case 2: body(i-2);
    case 1: body(i-1);
    case 0: ;
    } while( i < n );

    If you compare the generated code for the two approaches, I
    expect you'll find cases where the generated code for the
    unrolled-using-switch approach is better than the generated
    code for the for-then-switch approach, along every important
    axis. (That is what I found with some simple loop 'body's.)

    Certainly it doesn't make sense to use Duff's Device in all
    cases. Most commonly, loops shouldn't be unrolled at all,
    because the benefit that might come from unrolling just
    isn't significant (and may very well be negative rather than
    positive). But when it is important to unroll a loop,
    Duff's Device is one possible approach to do that; its
    use should be considered, compared and evaluated along with
    any other alternatives. Other things being equal, code that
    runs faster and has fewer lines of code seems like a better
    choice. In cases where Duff's Device produces such code,
    there is a pretty strong argument that it's the right
    approach to use in those circumstances.

    In short, I don't think Duff's Device is either always good
    or always bad. It's just a technique to know and compare
    against other possible approaches; whether it should be
    used or not depends on how it stacks up against the other
    possibilities, in the particular circumstances being
    investigated.
     
    Tim Rentsch, Aug 29, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jamie Risk

    Duff Device reference ...

    Jamie Risk, Oct 16, 2004, in forum: C Programming
    Replies:
    1
    Views:
    448
    Michael Wojcik
    Oct 20, 2004
  2. Christopher Benson-Manica

    A small question about Duff's Device

    Christopher Benson-Manica, Oct 21, 2004, in forum: C Programming
    Replies:
    2
    Views:
    327
    Michael Wojcik
    Oct 21, 2004
  3. Hallvard B Furuseth

    Duff's Device

    Hallvard B Furuseth, Sep 28, 2006, in forum: C Programming
    Replies:
    11
    Views:
    665
    Hallvard B Furuseth
    Oct 3, 2006
  4. Replies:
    10
    Views:
    747
  5. yawnmoth

    how does duff's device work?

    yawnmoth, Nov 9, 2008, in forum: C Programming
    Replies:
    7
    Views:
    348
    Phil Carmody
    Nov 9, 2008
Loading...

Share This Page