1. inline is not a portable feature of C.
2. There is no guarantee that inlining everything is going to speed up
your code. Larger executable means less efficient usage of the
processor cache.
A larger executable may mean less efficient usage of the processor
cache, but it may not. As an obvious example, judicious loop
unrolling (by the compiler) often causes larger code, but faster code.
Of course, everything in moderation: if you unroll too much you will
cause icache (or trace cache) thrashing because your working sets keep
pushing each other out. That's really the essence of what Dan was
getting at.
As usual, there's a bunch of competing factors. Inlining small,
frequently *called* functions may be beneficial because it eliminates
function call overhead and allows CSE and other optimizations to be
performed within the context of the caller. This could translate to
fewer branches, which reduces the likelihood of mispredicted branches,
which can be costly.
Now I just said small, frequently *called* functions, but I did not
mean frequently *used* functions, unless they're so small that
function call overhead is more than, or a significant percentage of,
their execution time. By frequently used functions, I mean the ones
you find sprinkled all over the code, but not typically in performance
sensitive areas. These are what you do not want to inline, because
the code bloat is not worth it.
One big problem is that your inline function looks like a function to
you, and in the source code, but not to the processor. Inline
functions will not cause a hot spot in the cache. They decrease
locality. If your program calls a frequently used inline function
twice reasonably near each other, for example, the second call will
*not* find the function already sitting in cache, ready to go.
Granted, hardware prefetch may cause this to be a moot point in some
cases, but then again, the hardware prefetch would then be fetching
something that should already be in the cache, and displacing
something else that may be beneficial.
So what can you inline? Well, the best functions to inline are the
ones that are used exactly once and are static. It's hard to go wrong
with that. With anything else, you need to profile your code and be
familiar with the relative costs your architecture imposes on you. As
always, when in doubt, check your compiler's assembly output. You may
be pleasantly surprised to find that your compiler is good at figuring
out which functions to inline for you, under certain optimization
levels. And if you religiously use static functions, as you should,
the compiler has a much easier time at doing just that.
If you're serious about code performance, you need to be familiar with
the applicable profiling tools and with the increasingly popular
profile driven optimization. Profile driven optimization gives the
compiler a much better idea of which branches are taken and which are
not, so that it can output the best possible assembly. Now when you
combine that with profiling and subsequent inlining and tweaking, you
have the possibility of some very well performing code. But you're
not going to get that by simply inlining everything, that's for sure.
Mark F. Haigh
(e-mail address removed)