For loop equivalent with the preprocessor

Discussion in 'C Programming' started by Nudge, Sep 14, 2003.

  1. Nudge

    Nudge Guest

    I have an array, and an unrolled loop which looks like this:

    do_something(A[0]);
    do_something(A[1]);
    ....
    do_something(A[255]);

    I thought: why should I type so much? I should write a macro.

    So I was looking to write something along the lines of:

    #define write_256(array) #for(i,0,255,do_something(foo);

    But I coudn't find a way to do it...

    Is there a way to write a macro that the C pre-processor will expand
    to k instructions, and be able to reference the iteration number?

    I thought of using recursive macro calls along the lines of:

    #define write_more(n,array) \
    #if (n>0) do_something(foo[256-n]); \
    write_more(n-1,array)

    but it seems implementers don't like recursive macro calls :)

    Any insight?
     
    Nudge, Sep 14, 2003
    #1
    1. Advertising

  2. Nudge <> scribbled the following:
    > I have an array, and an unrolled loop which looks like this:


    > do_something(A[0]);
    > do_something(A[1]);
    > ...
    > do_something(A[255]);


    > I thought: why should I type so much? I should write a macro.


    I think: Why are you bothering with this? Re-roll your loop into a
    genuine for loop, set your compiler's optimisation to "pretty darned
    high", and it might unroll the loop for you when generating code.
    Remember the rules about optimisation at source code level:
    (1) Don't do it.
    (2) (For experts only!) Don't do it yet.

    --
    /-- Joona Palaste () ---------------------------\
    | Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
    | http://www.helsinki.fi/~palaste W++ B OP+ |
    \----------------------------------------- Finland rules! ------------/
     
    Joona I Palaste, Sep 14, 2003
    #2
    1. Advertising

  3. Nudge

    Nudge Guest

    Joona I Palaste wrote:

    > Nudge <> scribbled the following:
    >
    >>I have an array, and an unrolled loop which looks like this:

    >
    >
    >>do_something(A[0]);
    >>do_something(A[1]);
    >>...
    >>do_something(A[255]);

    >
    >
    >>I thought: why should I type so much? I should write a macro.

    >
    >
    > I think: Why are you bothering with this? Re-roll your loop into a
    > genuine for loop, set your compiler's optimisation to "pretty darned
    > high", and it might unroll the loop for you when generating code.
    > Remember the rules about optimisation at source code level:
    > (1) Don't do it.
    > (2) (For experts only!) Don't do it yet.


    ARGH!

    I knew I should have said that I was smarter than my compiler :)

    I am not writing any serious code, only fooling around.

    Is there a way to beat the preprocessor into submission?
     
    Nudge, Sep 14, 2003
    #3
  4. In 'comp.lang.c', Nudge <> wrote:

    > I have an array, and an unrolled loop which looks like this:
    >
    > do_something(A[0]);
    > do_something(A[1]);
    > ...
    > do_something(A[255]);
    >
    > I thought: why should I type so much? I should write a macro.
    >
    > So I was looking to write something along the lines of:
    >
    > #define write_256(array) #for(i,0,255,do_something(foo);
    >
    > But I coudn't find a way to do it...


    There is no portable way do do this directly, but there is a trick anyway (I
    got it from c.l.c, BTW):

    Define the list of constant values:

    /* values.itm */
    ITEM(0)
    ITEM(1)
    ITEM(2)
    ITEM(3)
    ....
    ITEM(254)
    ITEM(255)

    This file can be automatically created with a trivial C program.

    Now, in your application, you do the following:

    /* myapp.c */
    <...>

    #define ITEM(a) \
    do_something(A[a]);

    #include "values.itm"
    #undef ITEM

    That's all folks.

    I use massively this trick to define repetitive code. It's very powerful and
    makes solid code when mastered.

    --
    -ed- [remove YOURBRA before answering me]
    The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
    <blank line>
    FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/
     
    Emmanuel Delahaye, Sep 14, 2003
    #4
  5. "Nudge" <> wrote in message
    news:3f64df26$0$27581$...
    > Joona I Palaste wrote:
    >
    > > Nudge <> scribbled the following:
    > >
    > >>I have an array, and an unrolled loop which looks like this:

    > >
    > >
    > >>do_something(A[0]);
    > >>do_something(A[1]);
    > >>...
    > >>do_something(A[255]);

    > >
    > >
    > >>I thought: why should I type so much? I should write a macro.

    > >
    > >
    > > I think: Why are you bothering with this? Re-roll your loop into a
    > > genuine for loop, set your compiler's optimisation to "pretty darned
    > > high", and it might unroll the loop for you when generating code.
    > > Remember the rules about optimisation at source code level:
    > > (1) Don't do it.
    > > (2) (For experts only!) Don't do it yet.

    >
    > ARGH!
    >
    > I knew I should have said that I was smarter than my compiler :)
    >
    > I am not writing any serious code, only fooling around.
    >
    > Is there a way to beat the preprocessor into submission?

    If you absolutely insist...

    #define X0(n) do_something(A[n]);
    #define X1(n) X0(n) X0(n+1)
    #define X2(n) X1(n) X1(n+2)
    #define X3(n) X2(n) X2(n+4)
    #define X4(n) X3(n) X3(n+8)
    #define X5(n) X4(n) X4(n+16)
    #define X6(n) X5(n) X5(n+32)
    #define X7(n) X6(n) X6(n+64)
    #define X8(n) X7(n) X7(n+128)

    X8(0)

    If you ask me, Joona's suggestion is considerably more reasonable.

    --
    poncho
     
    Scott Fluhrer, Sep 14, 2003
    #5
  6. I've never done this kind of stuff, but isn't there the option -funroll-loop
    (gcc)?
    I've seen lot of software use this option? Joona also suggested something
    similar I think?

    --
    Pushkar Pradhan
    "Emmanuel Delahaye" <> wrote in message
    news:Xns93F6F10078DF3hsnoservernet@130.133.1.4...
    > In 'comp.lang.c', Nudge <> wrote:
    >
    > > I have an array, and an unrolled loop which looks like this:
    > >
    > > do_something(A[0]);
    > > do_something(A[1]);
    > > ...
    > > do_something(A[255]);
    > >
    > > I thought: why should I type so much? I should write a macro.
    > >
    > > So I was looking to write something along the lines of:
    > >
    > > #define write_256(array) #for(i,0,255,do_something(foo);
    > >
    > > But I coudn't find a way to do it...

    >
    > There is no portable way do do this directly, but there is a trick anyway

    (I
    > got it from c.l.c, BTW):
    >
    > Define the list of constant values:
    >
    > /* values.itm */
    > ITEM(0)
    > ITEM(1)
    > ITEM(2)
    > ITEM(3)
    > ...
    > ITEM(254)
    > ITEM(255)
    >
    > This file can be automatically created with a trivial C program.
    >
    > Now, in your application, you do the following:
    >
    > /* myapp.c */
    > <...>
    >
    > #define ITEM(a) \
    > do_something(A[a]);
    >
    > #include "values.itm"
    > #undef ITEM
    >
    > That's all folks.
    >
    > I use massively this trick to define repetitive code. It's very powerful

    and
    > makes solid code when mastered.
    >
    > --
    > -ed- [remove YOURBRA before answering me]
    > The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
    > <blank line>
    > FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/
     
    Pushker Pradhan, Sep 14, 2003
    #6
  7. Nudge

    bd Guest

    Pushker Pradhan wrote:

    > I've never done this kind of stuff, but isn't there the option
    > -funroll-loop (gcc)?
    > I've seen lot of software use this option? Joona also suggested something
    > similar I think?


    In gcc, -funroll-loop causes loops to be unrolled into, well, non-loops.
    Example:
    int counter = 0;
    int i;
    for(i = 0; i < 5; i++)
    counter++;

    Becomes:
    int counter = 0;
    int i;
    counter++;
    counter++;
    counter++;
    counter++;
    counter++;
    i = 5;

    Of course, this is all done in assembler, so the result's probably different
    somewhat. But that's the idea behind unrolling loops - to eliminate the
    looping overhead.
    --
    Feminists just want the human race to be a tie.
     
    bd, Sep 15, 2003
    #7
  8. "Nudge" <> wrote in message
    news:3f64daf9$0$27568$...
    > I have an array, and an unrolled loop which looks like this:
    >
    > do_something(A[0]);
    > do_something(A[1]);
    > ...
    > do_something(A[255]);
    >
    > I thought: why should I type so much? I should write a macro.
    >
    > So I was looking to write something along the lines of:
    >
    > #define write_256(array) #for(i,0,255,do_something(foo);
    >
    > But I coudn't find a way to do it...
    >
    > Is there a way to write a macro that the C pre-processor will expand
    > to k instructions, and be able to reference the iteration number?


    The PL/I preprocessor can do this.

    %DO I=0 TO 255;
    do_something(A);
    %END
    %DEACTIVATE I

    (The last is the equivalent of #undef, except that you can %ACTIVATE it and
    get the previous value back again. The % indicates preprocessor
    statements.)

    I have wondered about the decrease in preprocessor power as computers get
    faster and computer memory gets larger. Consider the progression from PL/I
    to C to Java in terms of preprocessor power.

    There is, of course, no rule against using preprocessors for other than the
    intended language.

    -- glen
     
    Glen Herrmannsfeldt, Sep 15, 2003
    #8
  9. In 'comp.lang.c', "Pushker Pradhan" <> wrote:

    > I've never done this kind of stuff, but isn't there the option
    > -funroll-loop (gcc)?
    > I've seen lot of software use this option? Joona also suggested
    > something similar I think?


    If your compiler has the option, it's fine, but what if it has not? You have
    not always the choice of your compiler. It can belong to a customer's
    production process where nothing is allowed to change, or belong to some
    peculiar platform... who knows... Better to stay portable. Always (when
    possible).


    --
    -ed- [remove YOURBRA before answering me]
    The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
    <blank line>
    FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/
     
    Emmanuel Delahaye, Sep 15, 2003
    #9
  10. In 'comp.lang.c', "Glen Herrmannsfeldt" <> wrote:

    > The PL/I preprocessor can do this.
    >
    > %DO I=0 TO 255;
    > do_something(A);
    > %END
    > %DEACTIVATE I
    >
    > (The last is the equivalent of #undef, except that you can %ACTIVATE it and
    > get the previous value back again. The % indicates preprocessor
    > statements.)
    >
    > I have wondered about the decrease in preprocessor power as computers get
    > faster and computer memory gets larger. Consider the progression from PL/I
    > to C to Java in terms of preprocessor power.


    I recall that the Intel macro assembler for 8051 I used in the 90's was very
    powerfull, and did support iterative constructs like that.

    --
    -ed- [remove YOURBRA before answering me]
    The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
    <blank line>
    FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/
     
    Emmanuel Delahaye, Sep 15, 2003
    #10
  11. "Emmanuel Delahaye" <> wrote in message
    news:Xns93F7CB2CD4190hsnoservernet@130.133.1.4...
    > In 'comp.lang.c', "Glen Herrmannsfeldt" <> wrote:
    >
    > > The PL/I preprocessor can do this.
    > >
    > > %DO I=0 TO 255;
    > > do_something(A);
    > > %END
    > > %DEACTIVATE I
    > >
    > > (The last is the equivalent of #undef, except that you can %ACTIVATE it

    and
    > > get the previous value back again. The % indicates preprocessor
    > > statements.)
    > >
    > > I have wondered about the decrease in preprocessor power as computers

    get
    > > faster and computer memory gets larger. Consider the progression from

    PL/I
    > > to C to Java in terms of preprocessor power.

    >
    > I recall that the Intel macro assembler for 8051 I used in the 90's was

    very
    > powerfull, and did support iterative constructs like that.


    The OS/360 assemblers supported a variety of constructs, including if and
    goto. The while loop could be, and often was, written using macros. I used
    to know people using the OS/360 compilers to compile for most microcomputers
    by writing a macro for each possible opcode, and then post-processing the
    output files.

    I never used the 8051, though.

    -- glen
     
    Glen Herrmannsfeldt, Sep 15, 2003
    #11
  12. Nudge

    Nudge Guest

    Scott Fluhrer wrote:

    > "Nudge" <> wrote in message
    > news:3f64df26$0$27581$...
    >
    >>Joona I Palaste wrote:
    >>
    >>
    >>>Nudge <> scribbled the following:
    >>>
    >>>
    >>>>I have an array, and an unrolled loop which looks like this:
    >>>
    >>>
    >>>>do_something(A[0]);
    >>>>do_something(A[1]);
    >>>>...
    >>>>do_something(A[255]);
    >>>
    >>>
    >>>>I thought: why should I type so much? I should write a macro.
    >>>
    >>>
    >>>I think: Why are you bothering with this? Re-roll your loop into a
    >>>genuine for loop, set your compiler's optimisation to "pretty darned
    >>>high", and it might unroll the loop for you when generating code.
    >>>Remember the rules about optimisation at source code level:
    >>>(1) Don't do it.
    >>>(2) (For experts only!) Don't do it yet.

    >>
    >>ARGH!
    >>
    >>I knew I should have said that I was smarter than my compiler :)
    >>
    >>I am not writing any serious code, only fooling around.
    >>
    >>Is there a way to beat the preprocessor into submission?

    >
    > If you absolutely insist...
    >
    > #define X0(n) do_something(A[n]);
    > #define X1(n) X0(n) X0(n+1)
    > #define X2(n) X1(n) X1(n+2)
    > #define X3(n) X2(n) X2(n+4)
    > #define X4(n) X3(n) X3(n+8)
    > #define X5(n) X4(n) X4(n+16)
    > #define X6(n) X5(n) X5(n+32)
    > #define X7(n) X6(n) X6(n+64)
    > #define X8(n) X7(n) X7(n+128)
    >
    > X8(0)
    >
    > If you ask me, Joona's suggestion is considerably more reasonable.


    What about SHA-256's inner loop, where unrolling 8 times and
    symbolically renaming the 8 variables for every iteration allows one
    to write only 4 assignments?

    I don't think there are many optimizing compilers out there which
    are THAT smart...

    Anyway, thanks for the log2(n) solution. I find it fairly elegant.
     
    Nudge, Sep 15, 2003
    #12
  13. "Nudge" <> wrote in message
    news:3f6632ad$0$27575$...
    > Scott Fluhrer wrote:
    >
    > > "Nudge" <> wrote in message
    > > news:3f64df26$0$27581$...
    > >
    > >>Joona I Palaste wrote:
    > >>
    > >>
    > >>>Nudge <> scribbled the following:
    > >>>
    > >>>
    > >>>>I have an array, and an unrolled loop which looks like this:
    > >>>
    > >>>
    > >>>>do_something(A[0]);
    > >>>>do_something(A[1]);
    > >>>>...
    > >>>>do_something(A[255]);
    > >>>
    > >>>
    > >>>>I thought: why should I type so much? I should write a macro.
    > >>>
    > >>>
    > >>>I think: Why are you bothering with this? Re-roll your loop into a
    > >>>genuine for loop, set your compiler's optimisation to "pretty darned
    > >>>high", and it might unroll the loop for you when generating code.
    > >>>Remember the rules about optimisation at source code level:
    > >>>(1) Don't do it.
    > >>>(2) (For experts only!) Don't do it yet.
    > >>
    > >>ARGH!
    > >>
    > >>I knew I should have said that I was smarter than my compiler :)
    > >>
    > >>I am not writing any serious code, only fooling around.
    > >>
    > >>Is there a way to beat the preprocessor into submission?

    > >
    > > If you absolutely insist...
    > >
    > > #define X0(n) do_something(A[n]);
    > > #define X1(n) X0(n) X0(n+1)
    > > #define X2(n) X1(n) X1(n+2)
    > > #define X3(n) X2(n) X2(n+4)
    > > #define X4(n) X3(n) X3(n+8)
    > > #define X5(n) X4(n) X4(n+16)
    > > #define X6(n) X5(n) X5(n+32)
    > > #define X7(n) X6(n) X6(n+64)
    > > #define X8(n) X7(n) X7(n+128)
    > >
    > > X8(0)
    > >
    > > If you ask me, Joona's suggestion is considerably more reasonable.

    >
    > What about SHA-256's inner loop, where unrolling 8 times and
    > symbolically renaming the 8 variables for every iteration allows one
    > to write only 4 assignments?
    >
    > I don't think there are many optimizing compilers out there which
    > are THAT smart...
    >
    > Anyway, thanks for the log2(n) solution. I find it fairly elegant.

    Actually, elegant it ain't. Some obvious problems:

    - A rather annoying amount of changes are required if you change the 256 to,
    say, 300
    - It's fairly easy to have a typo somewhere in all the #define's. It'll
    compile just fine, but perhaps miss do_something(A[153]);
    - It is certainly less readable than an explicit for loop

    In my opinion, just stick with the explicit for loop

    --
    poncho
     
    Scott Fluhrer, Sep 16, 2003
    #13
  14. Nudge

    Nudge Guest

    >>What about SHA-256's inner loop, where unrolling 8 times and
    >>symbolically renaming the 8 variables for every iteration allows one
    >>to write only 4 assignments?
    >>
    >>I don't think there are many optimizing compilers out there which
    >>are THAT smart...
    >>
    >>Anyway, thanks for the log2(n) solution. I find it fairly elegant.

    >
    > Actually, elegant it ain't. Some obvious problems:
    >
    > - A rather annoying amount of changes are required if you change the 256 to,
    > say, 300
    > - It's fairly easy to have a typo somewhere in all the #define's. It'll
    > compile just fine, but perhaps miss do_something(A[153]);
    > - It is certainly less readable than an explicit for loop
    >
    > In my opinion, just stick with the explicit for loop


    Forgive me if I insist :)

    What about SHA-256's inner loop? (You're a cryptographer, you know
    what I'm talking about.)

    With no unrolling, one must turn a-h into an array (which puts
    additional pressure on the compiler) and use the iteration number as
    a shifting index, modulo 8.

    With 8 bodies of the loop, the modulo operations can go away.

    I doubt whether even a few optimizing compilers are smart enough to
    do that (I will test with gcc).
     
    Nudge, Sep 16, 2003
    #14
  15. Nudge

    Nudge Guest

    >> Anyway, thanks for the log2(n) solution. I find it fairly
    >> elegant.

    >
    > Actually, elegant it ain't. Some obvious problems:
    >
    > - A rather annoying amount of changes are required if you change
    > the 256 to, say, 300
    >
    > - It's fairly easy to have a typo somewhere in all the #define's.
    > It'll compile just fine, but perhaps miss do_something(A[153]);
    >
    > - It is certainly less readable than an explicit for loop
    >
    > In my opinion, just stick with the explicit for loop


    Forgive me if I insist.

    What about SHA-256's inner loop? (You're a cryptographer, you know
    what I'm talking about.)

    I want to write it like this:

    T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
    T2 = SIGMA0(a) + Maj(a,b,c);
    d += T1;
    h = T1 + T2;

    In the next iteration, I perform symbolic renaming, i.e.
    h is now a
    a is now b
    b is now c
    c is now d
    d is now e
    e is now f
    f is now g
    g is now h

    If I unroll 8 times, symbolic renaming comes for free.
     
    Nudge, Sep 16, 2003
    #15
  16. "Nudge" <> wrote in message
    news:3f66bb02$0$20153$...
    > >> Anyway, thanks for the log2(n) solution. I find it fairly
    > >> elegant.

    > >
    > > Actually, elegant it ain't. Some obvious problems:
    > >
    > > - A rather annoying amount of changes are required if you change
    > > the 256 to, say, 300
    > >
    > > - It's fairly easy to have a typo somewhere in all the #define's.
    > > It'll compile just fine, but perhaps miss do_something(A[153]);
    > >
    > > - It is certainly less readable than an explicit for loop
    > >
    > > In my opinion, just stick with the explicit for loop

    >
    > Forgive me if I insist.
    >
    > What about SHA-256's inner loop? (You're a cryptographer, you know
    > what I'm talking about.)

    If you're asking whether unrolling a loop is ever justified, well, on
    occasion it is. You just have to remember the costs:

    - It can be less obvious what's going on, making maintenance harder.
    - If you unroll too much, the loop might not fit in cache. This leads to
    slower performance.

    >
    > I want to write it like this:
    >
    > T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
    > T2 = SIGMA0(a) + Maj(a,b,c);
    > d += T1;
    > h = T1 + T2;
    >
    > In the next iteration, I perform symbolic renaming, i.e.
    > h is now a
    > a is now b
    > b is now c
    > c is now d
    > d is now e
    > e is now f
    > f is now g
    > g is now h
    >
    > If I unroll 8 times, symbolic renaming comes for free.

    On the other hand, it's nontrivial to use my hack to do the renaming for
    you. I'd suggest that if you unroll it 8 times, you explicitly list the 8
    invocations.

    --
    poncho
     
    Scott Fluhrer, Sep 17, 2003
    #16
  17. Nudge

    Eric Sosman Guest

    Scott Fluhrer wrote:
    >
    > "Nudge" <> wrote in message
    > news:3f66bb02$0$20153$...
    > > >> Anyway, thanks for the log2(n) solution. I find it fairly
    > > >> elegant.
    > > >
    > > > Actually, elegant it ain't. Some obvious problems:
    > > >
    > > > - A rather annoying amount of changes are required if you change
    > > > the 256 to, say, 300
    > > >
    > > > - It's fairly easy to have a typo somewhere in all the #define's.
    > > > It'll compile just fine, but perhaps miss do_something(A[153]);
    > > >
    > > > - It is certainly less readable than an explicit for loop
    > > >
    > > > In my opinion, just stick with the explicit for loop

    > >
    > > Forgive me if I insist.
    > >
    > > What about SHA-256's inner loop? (You're a cryptographer, you know
    > > what I'm talking about.)

    > If you're asking whether unrolling a loop is ever justified, well, on
    > occasion it is. You just have to remember the costs:
    >
    > - It can be less obvious what's going on, making maintenance harder.
    > - If you unroll too much, the loop might not fit in cache. This leads to
    > slower performance.
    >
    > >
    > > I want to write it like this:
    > >
    > > T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
    > > T2 = SIGMA0(a) + Maj(a,b,c);
    > > d += T1;
    > > h = T1 + T2;
    > >
    > > In the next iteration, I perform symbolic renaming, i.e.
    > > h is now a
    > > a is now b
    > > b is now c
    > > c is now d
    > > d is now e
    > > e is now f
    > > f is now g
    > > g is now h
    > >
    > > If I unroll 8 times, symbolic renaming comes for free.

    > On the other hand, it's nontrivial to use my hack to do the renaming for
    > you. I'd suggest that if you unroll it 8 times, you explicitly list the 8
    > invocations.


    For the case in point, you could write something like

    #define STEP(a,b,c,d,e,f,g,h) \
    T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; \
    T2 = SIGMA0(a) + Maj(a,b,c); \
    d += T1; \
    h = T1 + T2
    ...
    while (whatever) {
    STEP(a,b,c,d,e,f,g,h);
    STEP(b,c,d,e,f,g,h,a);
    ...
    STEP(h,a,b,c,d,e,f,g);
    }

    (Hmmm: If speed is the objective, couldn't the `W[t] + K[t]'
    piece be replaced by a precomputed `WplusK[t]'?)

    --
     
    Eric Sosman, Sep 17, 2003
    #17
  18. Nudge

    Nudge Guest

    >> What about SHA-256's inner loop? (You're a cryptographer, you know
    >> what I'm talking about.)

    >
    > If you're asking whether unrolling a loop is ever justified, well, on
    > occasion it is. You just have to remember the costs:
    >
    > - It can be less obvious what's going on, making maintenance harder.
    > - If you unroll too much, the loop might not fit in cache. This leads to
    > slower performance.


    Point taken.

    >>I want to write it like this:
    >>
    >>T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
    >>T2 = SIGMA0(a) + Maj(a,b,c);
    >>d += T1;
    >>h = T1 + T2;
    >>
    >>In the next iteration, I perform symbolic renaming, i.e.
    >> h is now a
    >> a is now b
    >> b is now c
    >> c is now d
    >> d is now e
    >> e is now f
    >> f is now g
    >> g is now h
    >>
    >>If I unroll 8 times, symbolic renaming comes for free.

    >
    > On the other hand, it's nontrivial to use my hack to do the renaming for
    > you. I'd suggest that if you unroll it 8 times, you explicitly list the 8
    > invocations.


    Well, if I change a-h to V[8] then
    a is V[(64-t)%8]
    b is V[(65-t)%8]
    ...
    h is V[(71-t)%8]

    All the indices can be computed at compile time. A good compiler
    should be able to scalarize my array back to a-h.

    Yes, 8 is not too bad, but a complete unroll requires 64
    (approximately 6 KB of code, it will fit inside the L1 I$ of my
    Athlon with room to spare).
     
    Nudge, Sep 17, 2003
    #18
  19. Nudge

    Nudge Guest

    > For the case in point, you could write something like
    >
    > #define STEP(a,b,c,d,e,f,g,h) \
    > T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; \
    > T2 = SIGMA0(a) + Maj(a,b,c); \
    > d += T1; \
    > h = T1 + T2
    > ...
    > while (whatever) {
    > STEP(a,b,c,d,e,f,g,h);
    > STEP(b,c,d,e,f,g,h,a);
    > ...
    > STEP(h,a,b,c,d,e,f,g);
    > }
    >
    > (Hmmm: If speed is the objective, couldn't the `W[t] + K[t]'
    > piece be replaced by a precomputed `WplusK[t]'?)


    This is precisely what I am doing. Thanks for the hint :)
     
    Nudge, Sep 17, 2003
    #19
  20. "Nudge" <> wrote in message
    news:3f68e436$0$20641$...
    > >> What about SHA-256's inner loop? (You're a cryptographer, you know
    > >> what I'm talking about.)

    > >
    > > If you're asking whether unrolling a loop is ever justified, well, on
    > > occasion it is. You just have to remember the costs:
    > >
    > > - It can be less obvious what's going on, making maintenance harder.
    > > - If you unroll too much, the loop might not fit in cache. This leads

    to
    > > slower performance.

    >
    > Point taken.
    >
    > >>I want to write it like this:
    > >>
    > >>T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
    > >>T2 = SIGMA0(a) + Maj(a,b,c);
    > >>d += T1;
    > >>h = T1 + T2;
    > >>
    > >>In the next iteration, I perform symbolic renaming, i.e.
    > >> h is now a
    > >> a is now b
    > >> b is now c
    > >> c is now d
    > >> d is now e
    > >> e is now f
    > >> f is now g
    > >> g is now h
    > >>
    > >>If I unroll 8 times, symbolic renaming comes for free.

    > >
    > > On the other hand, it's nontrivial to use my hack to do the renaming for
    > > you. I'd suggest that if you unroll it 8 times, you explicitly list the

    8
    > > invocations.

    >
    > Well, if I change a-h to V[8] then
    > a is V[(64-t)%8]
    > b is V[(65-t)%8]
    > ...
    > h is V[(71-t)%8]
    >
    > All the indices can be computed at compile time. A good compiler
    > should be able to scalarize my array back to a-h.
    >
    > Yes, 8 is not too bad, but a complete unroll requires 64
    > (approximately 6 KB of code, it will fit inside the L1 I$ of my
    > Athlon with room to spare).

    Yes, it'll fit, but it'll push most everything else out. That means that
    once you finish the SHA-256, you'll start generating lots of cache misses
    :-(

    --
    poncho
     
    Scott Fluhrer, Sep 18, 2003
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?VGltOjouLg==?=

    Loop the loop...

    =?Utf-8?B?VGltOjouLg==?=, Feb 16, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    1,403
    Karl Seguin
    Feb 16, 2005
  2. Steven

    while loop in a while loop

    Steven, Mar 24, 2005, in forum: Java
    Replies:
    5
    Views:
    2,274
    Tim Slattery
    Mar 30, 2005
  3. -
    Replies:
    12
    Views:
    707
    Remon van Vliet
    Jun 15, 2005
  4. Cronus
    Replies:
    1
    Views:
    698
    Paul Mensonides
    Jul 15, 2004
  5. Isaac Won
    Replies:
    9
    Views:
    419
    Ulrich Eckhardt
    Mar 4, 2013
Loading...

Share This Page