For loop equivalent with the preprocessor

Nudge · Sep 14, 2003

I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
....
do_something(A[255]);

I thought: why should I type so much? I should write a macro.

So I was looking to write something along the lines of:

#define write_256(array) #for(i,0,255,do_something(foo);

But I coudn't find a way to do it...

Is there a way to write a macro that the C pre-processor will expand
to k instructions, and be able to reference the iteration number?

I thought of using recursive macro calls along the lines of:

#define write_more(n,array) \
#if (n>0) do_something(foo[256-n]); \
write_more(n-1,array)

but it seems implementers don't like recursive macro calls

Any insight?

Joona I Palaste · Sep 14, 2003

Nudge said:
I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

Click to expand...

I thought: why should I type so much? I should write a macro.

I think: Why are you bothering with this? Re-roll your loop into a
genuine for loop, set your compiler's optimisation to "pretty darned
high", and it might unroll the loop for you when generating code.
Remember the rules about optimisation at source code level:
(1) Don't do it.
(2) (For experts only!) Don't do it yet.

Nudge · Sep 14, 2003

Joona said:
Nudge said:

I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

Click to expand...

I thought: why should I type so much? I should write a macro.

Click to expand...

I think: Why are you bothering with this? Re-roll your loop into a
genuine for loop, set your compiler's optimisation to "pretty darned
high", and it might unroll the loop for you when generating code.
Remember the rules about optimisation at source code level:
(1) Don't do it.
(2) (For experts only!) Don't do it yet.

ARGH!

I knew I should have said that I was smarter than my compiler

I am not writing any serious code, only fooling around.

Is there a way to beat the preprocessor into submission?

Emmanuel Delahaye · Sep 14, 2003

In 'comp.lang.c' said:
I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

I thought: why should I type so much? I should write a macro.

So I was looking to write something along the lines of:

#define write_256(array) #for(i,0,255,do_something(foo);

But I coudn't find a way to do it...

There is no portable way do do this directly, but there is a trick anyway (I
got it from c.l.c, BTW):

Define the list of constant values:

/* values.itm */
ITEM(0)
ITEM(1)
ITEM(2)
ITEM(3)
....
ITEM(254)
ITEM(255)

This file can be automatically created with a trivial C program.

Now, in your application, you do the following:

/* myapp.c */
<...>

#define ITEM(a) \
do_something(A[a]);

#include "values.itm"
#undef ITEM

That's all folks.

I use massively this trick to define repetitive code. It's very powerful and
makes solid code when mastered.

Scott Fluhrer · Sep 14, 2003

Nudge said:
Joona said:

Nudge said:

I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

Click to expand...

I thought: why should I type so much? I should write a macro.

Click to expand...

I think: Why are you bothering with this? Re-roll your loop into a
genuine for loop, set your compiler's optimisation to "pretty darned
high", and it might unroll the loop for you when generating code.
Remember the rules about optimisation at source code level:
(1) Don't do it.
(2) (For experts only!) Don't do it yet.

Click to expand...

ARGH!

I knew I should have said that I was smarter than my compiler

I am not writing any serious code, only fooling around.

Is there a way to beat the preprocessor into submission?

If you absolutely insist...

#define X0(n) do_something(A[n]);
#define X1(n) X0(n) X0(n+1)
#define X2(n) X1(n) X1(n+2)
#define X3(n) X2(n) X2(n+4)
#define X4(n) X3(n) X3(n+8)
#define X5(n) X4(n) X4(n+16)
#define X6(n) X5(n) X5(n+32)
#define X7(n) X6(n) X6(n+64)
#define X8(n) X7(n) X7(n+128)

X8(0)

If you ask me, Joona's suggestion is considerably more reasonable.

Pushker Pradhan · Sep 14, 2003

I've never done this kind of stuff, but isn't there the option -funroll-loop
(gcc)?
I've seen lot of software use this option? Joona also suggested something
similar I think?

--
Pushkar Pradhan

Emmanuel Delahaye said:
In 'comp.lang.c' said:

I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

I thought: why should I type so much? I should write a macro.

So I was looking to write something along the lines of:

#define write_256(array) #for(i,0,255,do_something(foo);

But I coudn't find a way to do it...

Click to expand...

There is no portable way do do this directly, but there is a trick anyway (I
got it from c.l.c, BTW):

Define the list of constant values:

/* values.itm */
ITEM(0)
ITEM(1)
ITEM(2)
ITEM(3)
...
ITEM(254)
ITEM(255)

This file can be automatically created with a trivial C program.

Now, in your application, you do the following:

/* myapp.c */
<...>

#define ITEM(a) \
do_something(A[a]);

#include "values.itm"
#undef ITEM

That's all folks.

I use massively this trick to define repetitive code. It's very powerful and
makes solid code when mastered.

--
-ed- (e-mail address removed) [remove YOURBRA before answering me]
The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
<blank line>
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/

bd · Sep 15, 2003

Pushker said:
I've never done this kind of stuff, but isn't there the option
-funroll-loop (gcc)?
I've seen lot of software use this option? Joona also suggested something
similar I think?

In gcc, -funroll-loop causes loops to be unrolled into, well, non-loops.
Example:
int counter = 0;
int i;
for(i = 0; i < 5; i++)
counter++;

Becomes:
int counter = 0;
int i;
counter++;
counter++;
counter++;
counter++;
counter++;
i = 5;

Of course, this is all done in assembler, so the result's probably different
somewhat. But that's the idea behind unrolling loops - to eliminate the
looping overhead.

Glen Herrmannsfeldt · Sep 15, 2003

Nudge said:
I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

I thought: why should I type so much? I should write a macro.

So I was looking to write something along the lines of:

#define write_256(array) #for(i,0,255,do_something(foo);

But I coudn't find a way to do it...

Is there a way to write a macro that the C pre-processor will expand
to k instructions, and be able to reference the iteration number?

The PL/I preprocessor can do this.

%DO I=0 TO 255;
do_something(A);
%END
%DEACTIVATE I

(The last is the equivalent of #undef, except that you can %ACTIVATE it and
get the previous value back again. The % indicates preprocessor
statements.)

I have wondered about the decrease in preprocessor power as computers get
faster and computer memory gets larger. Consider the progression from PL/I
to C to Java in terms of preprocessor power.

There is, of course, no rule against using preprocessors for other than the
intended language.

-- glen

Emmanuel Delahaye · Sep 15, 2003

Pushker Pradhan said:
I've never done this kind of stuff, but isn't there the option
-funroll-loop (gcc)?
I've seen lot of software use this option? Joona also suggested
something similar I think?

If your compiler has the option, it's fine, but what if it has not? You have
not always the choice of your compiler. It can belong to a customer's
production process where nothing is allowed to change, or belong to some
peculiar platform... who knows... Better to stay portable. Always (when
possible).

Emmanuel Delahaye · Sep 15, 2003

Glen Herrmannsfeldt said:
The PL/I preprocessor can do this.

%DO I=0 TO 255;
do_something(A);
%END
%DEACTIVATE I

(The last is the equivalent of #undef, except that you can %ACTIVATE it and
get the previous value back again. The % indicates preprocessor
statements.)

I have wondered about the decrease in preprocessor power as computers get
faster and computer memory gets larger. Consider the progression from PL/I
to C to Java in terms of preprocessor power.

I recall that the Intel macro assembler for 8051 I used in the 90's was very
powerfull, and did support iterative constructs like that.

Glen Herrmannsfeldt · Sep 15, 2003

Emmanuel Delahaye said:
Glen Herrmannsfeldt said:

The PL/I preprocessor can do this.

%DO I=0 TO 255;
do_something(A);
%END
%DEACTIVATE I

(The last is the equivalent of #undef, except that you can %ACTIVATE it and
get the previous value back again. The % indicates preprocessor
statements.)

I have wondered about the decrease in preprocessor power as computers get
faster and computer memory gets larger. Consider the progression from PL/I
to C to Java in terms of preprocessor power.

Click to expand...

I recall that the Intel macro assembler for 8051 I used in the 90's was very
powerfull, and did support iterative constructs like that.

The OS/360 assemblers supported a variety of constructs, including if and
goto. The while loop could be, and often was, written using macros. I used
to know people using the OS/360 compilers to compile for most microcomputers
by writing a macro for each possible opcode, and then post-processing the
output files.

I never used the 8051, though.

-- glen

Nudge · Sep 15, 2003

Scott said:
Joona I Palaste wrote:

Nudge <[email protected]> scribbled the following:

I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

I thought: why should I type so much? I should write a macro.

I think: Why are you bothering with this? Re-roll your loop into a
genuine for loop, set your compiler's optimisation to "pretty darned
high", and it might unroll the loop for you when generating code.
Remember the rules about optimisation at source code level:
(1) Don't do it.
(2) (For experts only!) Don't do it yet.

Click to expand...

ARGH!

I knew I should have said that I was smarter than my compiler

I am not writing any serious code, only fooling around.

Is there a way to beat the preprocessor into submission?

Click to expand...

If you absolutely insist...

#define X0(n) do_something(A[n]);
#define X1(n) X0(n) X0(n+1)
#define X2(n) X1(n) X1(n+2)
#define X3(n) X2(n) X2(n+4)
#define X4(n) X3(n) X3(n+8)
#define X5(n) X4(n) X4(n+16)
#define X6(n) X5(n) X5(n+32)
#define X7(n) X6(n) X6(n+64)
#define X8(n) X7(n) X7(n+128)

X8(0)

If you ask me, Joona's suggestion is considerably more reasonable.

What about SHA-256's inner loop, where unrolling 8 times and
symbolically renaming the 8 variables for every iteration allows one
to write only 4 assignments?

I don't think there are many optimizing compilers out there which
are THAT smart...

Anyway, thanks for the log2(n) solution. I find it fairly elegant.

Scott Fluhrer · Sep 16, 2003

Nudge said:
Scott said:

Joona I Palaste wrote:

Nudge <[email protected]> scribbled the following:

I have an array, and an unrolled loop which looks like this:

do_something(A[0]);
do_something(A[1]);
...
do_something(A[255]);

I thought: why should I type so much? I should write a macro.

I think: Why are you bothering with this? Re-roll your loop into a
genuine for loop, set your compiler's optimisation to "pretty darned
high", and it might unroll the loop for you when generating code.
Remember the rules about optimisation at source code level:
(1) Don't do it.
(2) (For experts only!) Don't do it yet.

ARGH!

I knew I should have said that I was smarter than my compiler

I am not writing any serious code, only fooling around.

Is there a way to beat the preprocessor into submission?

Click to expand...

If you absolutely insist...

#define X0(n) do_something(A[n]);
#define X1(n) X0(n) X0(n+1)
#define X2(n) X1(n) X1(n+2)
#define X3(n) X2(n) X2(n+4)
#define X4(n) X3(n) X3(n+8)
#define X5(n) X4(n) X4(n+16)
#define X6(n) X5(n) X5(n+32)
#define X7(n) X6(n) X6(n+64)
#define X8(n) X7(n) X7(n+128)

X8(0)

If you ask me, Joona's suggestion is considerably more reasonable.

Click to expand...

What about SHA-256's inner loop, where unrolling 8 times and
symbolically renaming the 8 variables for every iteration allows one
to write only 4 assignments?

I don't think there are many optimizing compilers out there which
are THAT smart...

Anyway, thanks for the log2(n) solution. I find it fairly elegant.

Actually, elegant it ain't. Some obvious problems:

- A rather annoying amount of changes are required if you change the 256 to,
say, 300
- It's fairly easy to have a typo somewhere in all the #define's. It'll
compile just fine, but perhaps miss do_something(A[153]);
- It is certainly less readable than an explicit for loop

In my opinion, just stick with the explicit for loop

--
poncho

Nudge · Sep 16, 2003

What about SHA-256's inner loop, where unrolling 8 times and

symbolically renaming the 8 variables for every iteration allows one
to write only 4 assignments?

I don't think there are many optimizing compilers out there which
are THAT smart...

Anyway, thanks for the log2(n) solution. I find it fairly elegant.

Click to expand...

Actually, elegant it ain't. Some obvious problems:

- A rather annoying amount of changes are required if you change the 256 to,
say, 300
- It's fairly easy to have a typo somewhere in all the #define's. It'll
compile just fine, but perhaps miss do_something(A[153]);
- It is certainly less readable than an explicit for loop

In my opinion, just stick with the explicit for loop

Forgive me if I insist

What about SHA-256's inner loop? (You're a cryptographer, you know
what I'm talking about.)

With no unrolling, one must turn a-h into an array (which puts
additional pressure on the compiler) and use the iteration number as
a shifting index, modulo 8.

With 8 bodies of the loop, the modulo operations can go away.

I doubt whether even a few optimizing compilers are smart enough to
do that (I will test with gcc).

Nudge · Sep 16, 2003

Anyway, thanks for the log2(n) solution. I find it fairly

elegant.

Click to expand...

Actually, elegant it ain't. Some obvious problems:

- A rather annoying amount of changes are required if you change
the 256 to, say, 300

- It's fairly easy to have a typo somewhere in all the #define's.
It'll compile just fine, but perhaps miss do_something(A[153]);

- It is certainly less readable than an explicit for loop

In my opinion, just stick with the explicit for loop

Forgive me if I insist.

What about SHA-256's inner loop? (You're a cryptographer, you know
what I'm talking about.)

I want to write it like this:

T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
T2 = SIGMA0(a) + Maj(a,b,c);
d += T1;
h = T1 + T2;

In the next iteration, I perform symbolic renaming, i.e.
h is now a
a is now b
b is now c
c is now d
d is now e
e is now f
f is now g
g is now h

If I unroll 8 times, symbolic renaming comes for free.

Scott Fluhrer · Sep 17, 2003

Nudge said:
Anyway, thanks for the log2(n) solution. I find it fairly
elegant.

Click to expand...

Actually, elegant it ain't. Some obvious problems:

- A rather annoying amount of changes are required if you change
the 256 to, say, 300

- It's fairly easy to have a typo somewhere in all the #define's.
It'll compile just fine, but perhaps miss do_something(A[153]);

- It is certainly less readable than an explicit for loop

In my opinion, just stick with the explicit for loop

Click to expand...

Forgive me if I insist.

What about SHA-256's inner loop? (You're a cryptographer, you know
what I'm talking about.)

If you're asking whether unrolling a loop is ever justified, well, on
occasion it is. You just have to remember the costs:

- It can be less obvious what's going on, making maintenance harder.
- If you unroll too much, the loop might not fit in cache. This leads to
slower performance.

I want to write it like this:

T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
T2 = SIGMA0(a) + Maj(a,b,c);
d += T1;
h = T1 + T2;

In the next iteration, I perform symbolic renaming, i.e.
h is now a
a is now b
b is now c
c is now d
d is now e
e is now f
f is now g
g is now h

If I unroll 8 times, symbolic renaming comes for free.

On the other hand, it's nontrivial to use my hack to do the renaming for
you. I'd suggest that if you unroll it 8 times, you explicitly list the 8
invocations.

Eric Sosman · Sep 17, 2003

Scott said:
Nudge said:

Anyway, thanks for the log2(n) solution. I find it fairly
elegant.

Actually, elegant it ain't. Some obvious problems:

- A rather annoying amount of changes are required if you change
the 256 to, say, 300

- It's fairly easy to have a typo somewhere in all the #define's.
It'll compile just fine, but perhaps miss do_something(A[153]);

- It is certainly less readable than an explicit for loop

In my opinion, just stick with the explicit for loop

Click to expand...

Forgive me if I insist.

What about SHA-256's inner loop? (You're a cryptographer, you know
what I'm talking about.)

Click to expand...

If you're asking whether unrolling a loop is ever justified, well, on
occasion it is. You just have to remember the costs:

- It can be less obvious what's going on, making maintenance harder.
- If you unroll too much, the loop might not fit in cache. This leads to
slower performance.

I want to write it like this:

T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
T2 = SIGMA0(a) + Maj(a,b,c);
d += T1;
h = T1 + T2;

In the next iteration, I perform symbolic renaming, i.e.
h is now a
a is now b
b is now c
c is now d
d is now e
e is now f
f is now g
g is now h

If I unroll 8 times, symbolic renaming comes for free.

Click to expand...

On the other hand, it's nontrivial to use my hack to do the renaming for
you. I'd suggest that if you unroll it 8 times, you explicitly list the 8
invocations.

For the case in point, you could write something like

#define STEP(a,b,c,d,e,f,g,h) \
T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; \
T2 = SIGMA0(a) + Maj(a,b,c); \
d += T1; \
h = T1 + T2
...
while (whatever) {
STEP(a,b,c,d,e,f,g,h);
STEP(b,c,d,e,f,g,h,a);
...
STEP(h,a,b,c,d,e,f,g);
}

(Hmmm: If speed is the objective, couldn't the `W[t] + K[t]'
piece be replaced by a precomputed `WplusK[t]'?)

Nudge · Sep 17, 2003

What about SHA-256's inner loop? (You're a cryptographer, you know

If you're asking whether unrolling a loop is ever justified, well, on
occasion it is. You just have to remember the costs:

- It can be less obvious what's going on, making maintenance harder.
- If you unroll too much, the loop might not fit in cache. This leads to
slower performance.

Point taken.

I want to write it like this:

T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
T2 = SIGMA0(a) + Maj(a,b,c);
d += T1;
h = T1 + T2;

In the next iteration, I perform symbolic renaming, i.e.
h is now a
a is now b
b is now c
c is now d
d is now e
e is now f
f is now g
g is now h

If I unroll 8 times, symbolic renaming comes for free.

Click to expand...

On the other hand, it's nontrivial to use my hack to do the renaming for
you. I'd suggest that if you unroll it 8 times, you explicitly list the 8
invocations.

Well, if I change a-h to V[8] then
a is V[(64-t)%8]
b is V[(65-t)%8]
...
h is V[(71-t)%8]

All the indices can be computed at compile time. A good compiler
should be able to scalarize my array back to a-h.

Yes, 8 is not too bad, but a complete unroll requires 64
(approximately 6 KB of code, it will fit inside the L1 I$ of my
Athlon with room to spare).

Nudge · Sep 17, 2003

For the case in point, you could write something like

#define STEP(a,b,c,d,e,f,g,h) \
T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h; \
T2 = SIGMA0(a) + Maj(a,b,c); \
d += T1; \
h = T1 + T2
...
while (whatever) {
STEP(a,b,c,d,e,f,g,h);
STEP(b,c,d,e,f,g,h,a);
...
STEP(h,a,b,c,d,e,f,g);
}

(Hmmm: If speed is the objective, couldn't the `W[t] + K[t]'
piece be replaced by a precomputed `WplusK[t]'?)

This is precisely what I am doing. Thanks for the hint

Scott Fluhrer · Sep 18, 2003

Nudge said:
If you're asking whether unrolling a loop is ever justified, well, on
occasion it is. You just have to remember the costs:

- It can be less obvious what's going on, making maintenance harder.
- If you unroll too much, the loop might not fit in cache. This leads to
slower performance.

Click to expand...

Point taken.

I want to write it like this:

T1 = SIGMA1(e) + Ch(e,f,g) + W[t] + K[t] + h;
T2 = SIGMA0(a) + Maj(a,b,c);
d += T1;
h = T1 + T2;

In the next iteration, I perform symbolic renaming, i.e.
h is now a
a is now b
b is now c
c is now d
d is now e
e is now f
f is now g
g is now h

If I unroll 8 times, symbolic renaming comes for free.

Click to expand...

On the other hand, it's nontrivial to use my hack to do the renaming for
you. I'd suggest that if you unroll it 8 times, you explicitly list the 8
invocations.

Click to expand...

Well, if I change a-h to V[8] then
a is V[(64-t)%8]
b is V[(65-t)%8]
...
h is V[(71-t)%8]

All the indices can be computed at compile time. A good compiler
should be able to scalarize my array back to a-h.

Yes, 8 is not too bad, but a complete unroll requires 64
(approximately 6 KB of code, it will fit inside the L1 I$ of my
Athlon with room to spare).

Yes, it'll fit, but it'll push most everything else out. That means that
once you finish the SHA-256, you'll start generating lots of cache misses
:-(

preprocessor bug?	2	May 10, 2013
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
Preprocessor unique names	6	Apr 11, 2009
metaprogramming with the preprocessor: when is it too much? (inspiredby the iterators thread)	3	Mar 26, 2010
Pointer problem with simple preprocessor define	9	Oct 27, 2010
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Preprocessor directive including another one	3	May 29, 2008
Preprocessor trick	6	Nov 22, 2006

For loop equivalent with the preprocessor

Nudge

Joona I Palaste

Nudge

Emmanuel Delahaye

Scott Fluhrer

Pushker Pradhan

bd

Glen Herrmannsfeldt

Emmanuel Delahaye

Emmanuel Delahaye

Glen Herrmannsfeldt

Nudge

Scott Fluhrer

Nudge

Nudge

Scott Fluhrer

Eric Sosman

Nudge

Nudge

Scott Fluhrer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads