For loop equivalent with the preprocessor

Nudge · Sep 18, 2003

Yes, 8 is not too bad, but a complete unroll requires 64

Yes, it'll fit, but it'll push most everything else out. That
means that once you finish the SHA-256, you'll start generating
lots of cache misses :-(

Hi Scott,

The entire routine weighs approximately 8 KB (75% from the unrolled
inner loop).

What do you mean "once finish the SHA-256"? When the OS switches
the context to a different process? I thought the cache was flushed
anyway on a context switch...

P.S. The Athlon, unlike the P4, has large L1 caches:
64 KB L1 I$
64 KB L1 D$
256 KB L2 I+D$ (512 KB for Barton)

Paul Hsieh · Sep 18, 2003

Nudge said:
Scott said:

If you absolutely insist...

#define X0(n) do_something(A[n]);
#define X1(n) X0(n) X0(n+1)
#define X2(n) X1(n) X1(n+2)
#define X3(n) X2(n) X2(n+4)
#define X4(n) X3(n) X3(n+8)
#define X5(n) X4(n) X4(n+16)
#define X6(n) X5(n) X5(n+32)
#define X7(n) X6(n) X6(n+64)
#define X8(n) X7(n) X7(n+128)

X8(0)

If you ask me, Joona's suggestion is considerably more reasonable.

Click to expand...

What about SHA-256's inner loop, where unrolling 8 times and
symbolically renaming the 8 variables for every iteration allows one
to write only 4 assignments?

Another important case is, what if do_something() is defined as follows:

#define do_something(x) do_something_else ((x), #x, __LINE__)

I don't think there are many optimizing compilers out there which
are THAT smart...

Anyway, thanks for the log2(n) solution. I find it fairly elegant.

Yeah, but log4(n) is even better, if you are really just trying to save typing,
but its a little annoying to do odd numbers.

Paul Hsieh · Sep 18, 2003

Scott Fluhrer said:
If you're asking whether unrolling a loop is ever justified, well, on
occasion it is. You just have to remember the costs:

- It can be less obvious what's going on, making maintenance harder.

The same can be said of any macro usage.

- If you unroll too much, the loop might not fit in cache. This leads to
slower performance.

Humans themselves generally don't unroll to a point where the I-cache
is affected. This is usually only an issue if the *compiler* unrolls
everything, since it might unroll with wild abandon.

Keith Thompson · Sep 20, 2003

Glen Herrmannsfeldt said:
I have wondered about the decrease in preprocessor power as computers get
faster and computer memory gets larger. Consider the progression from PL/I
to C to Java in terms of preprocessor power.

I once heard about a language with a generics facility (closer to C++
templates than to macros) that could do the Towers of Hanoi problem at
compile time.

There is, of course, no rule against using preprocessors for other than the
intended language.

You can run into some nasty problems with tokenization. Try using a C
preprocessor on a language that has a standalone apostrophe token.

Getting back to C, I tend to think that C's preprocessor is powerful
enough, perhaps too powerful. Or maybe it's just not integrated into
the language cleanly enough.

As for the original question (using the preprocessor to manually
unroll a loop), I don't believe there's any clean way to do that.
Some of the suggested solutions work reasonably well for powers of 2,
but not for arbitrary numbers.

If you really want to to do this kind of thing, you might consider
writing your own program that generates C source code. Personally,
I'd probably use Perl for the job, but that's just me. If you're on a
Unix-like system (more precisely, if you don't care about portability
to non-Unix-like systems), you might also look into the m4 macro
processor.

Glen Herrmannsfeldt · Sep 20, 2003

Keith Thompson said:
I once heard about a language with a generics facility (closer to C++
templates than to macros) that could do the Towers of Hanoi problem at
compile time.

You can run into some nasty problems with tokenization. Try using a C
preprocessor on a language that has a standalone apostrophe token.

At least some versions of Fortran have one. For direct access files, IBM
Fortran has traditionally used READ(unit'block).

Many years ago I was working with a generic preprocessor called STEP,
written in Fortran, and trying to use it to preprocess Fortran. I had much
trouble with that one.

Getting back to C, I tend to think that C's preprocessor is powerful
enough, perhaps too powerful. Or maybe it's just not integrated into
the language cleanly enough.

As for the original question (using the preprocessor to manually
unroll a loop), I don't believe there's any clean way to do that.
Some of the suggested solutions work reasonably well for powers of 2,
but not for arbitrary numbers.

The PL/I preprocessor also has compile time procedure calls, among other
features.

Also, compile time %IF, so one can conditionally unroll a loop based on a
compile time constant. Early PL/I compilers were designed to run on small
machines, so machine size can't be the reason for the C preprocessor being
the way it is.

(snip)

-- glen

Dave Thompson · Sep 22, 2003

On Sun said:
So I was looking to write something along the lines of:

#define write_256(array) #for(i,0,255,do_something(foo);

But I coudn't find a way to do it...

Is there a way to write a macro that the C pre-processor will expand
to k instructions, and be able to reference the iteration number?

Only the (clumsy) ways already given by Fluhrer and Delahaye.

#if outside_of_C /* especially if you need this often or vitally */
You could consider using another macro processor (like m4) or
text-processing utility (like sed, awk, perl), perhaps driven
automatically by your makefile or equivalent.
#endif

#ifdef __cplusplus
You can try (an invocation of) a template inline function that
"iterates" down to a partial specialization that terminates it; but
the C++ standard doesn't require inlines to actually be inlined, and
allows an implementation limit on nested/recursive invocation which is
only recommended to be at least 17.
#endif

Plus of course compiler features like -funroll-loops.

I thought of using recursive macro calls along the lines of:

#define write_more(n,array) \
#if (n>0) do_something(foo[256-n]); \
write_more(n-1,array)

but it seems implementers don't like recursive macro calls

Click to expand...

Not just implementers; the standard requires that recursive macro
invocations (direct or indirect) not be expanded. And even if they
could be and were, there's no format of #if that works like that.

- David.Thompson1 at worldnet.att.net

preprocessor bug?	2	May 10, 2013
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
Preprocessor unique names	6	Apr 11, 2009
metaprogramming with the preprocessor: when is it too much? (inspiredby the iterators thread)	3	Mar 26, 2010
Pointer problem with simple preprocessor define	9	Oct 27, 2010
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Preprocessor directive including another one	3	May 29, 2008
Preprocessor trick	6	Nov 22, 2006

For loop equivalent with the preprocessor

Nudge

Paul Hsieh

Paul Hsieh

Keith Thompson

Glen Herrmannsfeldt

Dave Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads