inline functions

BartC · Nov 24, 2012

Is there a way of controlling which functions are inlined, or likely to be
inlined, and which ones you never want inlined? Especially with gcc...

(I have a file with over 300 functions, each of which is called at most
once. Strange things are happening with performance, such as being 60%
slower, when, for example, a function's body is populated with code (they
start off empty), even though the function is never actually called.

I assume this is due to selective inlining. At the moment I've had to
segregate the functions from the calls, to avoid undue influence on
performance, but want to benefit from inlining later on, without having to
re-integrate the functions one by one.)

SG · Nov 24, 2012

Am 24.11.2012 13:20, schrieb BartC:

Is there a way of controlling which functions are inlined, or likely to
be inlined, and which ones you never want inlined? Especially with gcc...

Yes, I think there is. RTFM.

Cheers!
SG

BartC · Nov 24, 2012

SG said:
Am 24.11.2012 13:20, schrieb BartC:

Yes, I think there is. RTFM.

OK. My experience of trying to extract useful information out of gcc docs is
that it's not very rewarding...

Joost Kraaijeveld · Nov 24, 2012

OK. My experience of trying to extract useful information out of gcc
docs is that it's not very rewarding...

Because a just "RTFM" is not really helpful to begin with, and rude in
the worst case:

google "force inlining gcc", first entry:

http://stackoverflow.com/questions/8381293/how-do-i-force-gcc-to-inline-a-function

Jorgen Grahn · Nov 24, 2012

Is there a way of controlling which functions are inlined, or likely to be
inlined, and which ones you never want inlined? Especially with gcc...

(I have a file with over 300 functions, each of which is called at most
once. Strange things are happening with performance, such as being 60%
slower, when, for example, a function's body is populated with code (they
start off empty), even though the function is never actually called.

Care to provide a complete example? This seems highly unlikely -- a
function which is never called will simply be discarded by the compiler.
Worst case it will be sitting there doing nothing, perhaps making the
CPU's instruction cache work a little bit less well.

I assume this is due to selective inlining. At the moment I've had to
segregate the functions from the calls, to avoid undue influence on
performance, but want to benefit from inlining later on, without having to
re-integrate the functions one by one.)

I don't understand this. How can this be because of inlining, and yet
you don't have inlining?

More generally, my own rules of thumb:
- things I'm quite sure will be useful to inline, and which
are called in several translation units:
-> 'static inline' in some header file
- anything else:
-> 'static', and trust the compiler to choose wisely
- play with the optimization level (-O2, -O3, -Os in gcc)
- look at the object code if I want to know the outcome

- in the unlikely case that I need more performance /and/ believe
this is a worthwhile area to investigate, I'd read the GCC manual.
I know it covers this.

/Jorgen

Malcolm McLean · Nov 24, 2012

You can take the address of a function. Call it somewhere indirectly using a bit of logic the compiler can't optimise out, eg

int (*foo)(int) = notinlined;
unsigned char buff[ sizeof(foo) ];
memcpy(buff, &foo, sizeof(buff)];
x = sqrt((int)buff[0] * buff[0]));
if(x != floor(x))
printf("%d\n", (*foo)(0));

it's a bit hacky. The compiler won't be clever enough to know that the square
root of a square is always an integer. Less drastic measures will work on
most compilers.

Alain Ketterlin · Nov 24, 2012

BartC said:
OK. My experience of trying to extract useful information out of gcc
docs is that it's not very rewarding...

Function attributes are described here:

http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html#Function-Attributes

Command line options are here:

http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Optimize-Options.html#Optimize-Options

Search for "inline". Be careful if you need portable code.

-- Alain.

BartC · Nov 24, 2012

Jorgen Grahn said:
Care to provide a complete example? This seems highly unlikely -- a
function which is never called will simply be discarded by the compiler.

The module is an interpreter dispatch loop. I'm experimenting with several;
this one has one function containing a switch statement with 300 cases. Each
case calls a corresponding function in the rest of the module.

So a call to each function exists in the source, but is never called in
practice (at least, not in the specific bytecode program being executed that
exhibited the slowdown).

The compiler has no way of knowing which of the 300 cases will be called (it
depends on what bytecode is being executed), nor how often. I, on the other
hand, have a much better idea! So I don't need rarely executed handlers
inlined for example.

There are various ways to tackle this, but it would be most convenient if
there was an attribute 'never_inline' (I think there is one to always
inline).

I don't understand this. How can this be because of inlining, and yet
you don't have inlining?

Inlining must be happening at a result of optimisation. I haven't yet
explicitly told it not to (except by moving all the functions to another
file).

More generally, my own rules of thumb:
- things I'm quite sure will be useful to inline, and which
are called in several translation units:
-> 'static inline' in some header file

I use 'static' on all of the functions (because they are not exported).
Perhaps I should look at that...

- look at the object code if I want to know the outcome

(That's another thing: I've tried several incantations to get gcc to produce
assembler in a .s file, that contains some cross-references to the original
C
source. Haven't managed it yet..)

James Kuyper · Nov 24, 2012

Is there a way of controlling which functions are inlined, or likely to be
inlined, and which ones you never want inlined?

The 'inline' keyword is only a hint; it doesn't guarantee anything. The
only way to be sure a function won't be inlined is to define it in a
different translation unit. There's no way to be sure it will be inlined
- even if you inline it yourself, manually, a sufficiently clever
compiler might reach the conclusion that that the common code used in
several different places in your program should be extracted into a
separate function.

... Especially with gcc...

gcc has dozens of options that contain the word "inline". I had planned
to read the description of each option, and recommend the one(s) that
seemed most relevant to your question. However, when I realized that
there were so many to choose from, I decided to let you do the reading.

James Kuyper · Nov 24, 2012

You can take the address of a function. Call it somewhere indirectly using a bit of logic the compiler can't optimise out, eg

int (*foo)(int) = notinlined;
unsigned char buff[ sizeof(foo) ];
memcpy(buff, &foo, sizeof(buff)];
x = sqrt((int)buff[0] * buff[0]));
if(x != floor(x))
printf("%d\n", (*foo)(0));

it's a bit hacky. The compiler won't be clever enough to know that the square
root of a square is always an integer. Less drastic measures will work on
most compilers.

In the long run, you're better off looking for ways (usually
implementation-specific) to command the compiler to do what you want,
than trying to trick it into doing so. If you can figure it out, a
sufficiently clever compiler will figure it out. The only guaranteed
effect is to confuse other humans who need to read the tricky code.

Alain Ketterlin · Nov 24, 2012

BartC said:
The module is an interpreter dispatch loop. I'm experimenting with several;
this one has one function containing a switch statement with 300 cases. Each
case calls a corresponding function in the rest of the module.

So a call to each function exists in the source, but is never called in
practice (at least, not in the specific bytecode program being executed that
exhibited the slowdown).

The compiler has no way of knowing which of the 300 cases will be called (it
depends on what bytecode is being executed), nor how often. I, on the other
hand, have a much better idea! So I don't need rarely executed handlers
inlined for example.

There are various ways to tackle this, but it would be most convenient if
there was an attribute 'never_inline' (I think there is one to always
inline).

It goes the other way round: there is a command line option
(-fno-inline-functions) to prevent inlining, and an attribute
always_inline to force inlining (of inline functions).

Inlining must be happening at a result of optimisation. I haven't yet
explicitly told it not to (except by moving all the functions to another
file).

If inlining is enabled, the compiler will decide whether or not to
inline on a per-callsite basis, not on a per-function (callee) basis. It
doesn't make much difference in your case iiuc. But keep in mind that
the call site is important. With a 300-case switch, it is likely that
the compiler will find your function too big and will refrain from
making it bigger.

I use 'static' on all of the functions (because they are not exported).
Perhaps I should look at that...

static will make the compiler not emit code when all calls have been
inlined. It should not influence the inlining decision.

(That's another thing: I've tried several incantations to get gcc to
produce assembler in a .s file, that contains some cross-references to
the original C source. Haven't managed it yet..)

objdump -d -l will print file:line on an image containing debug info.

-- Alain.

BartC · Nov 24, 2012

Alain Ketterlin said:
It goes the other way round: there is a command line option
(-fno-inline-functions) to prevent inlining, and an attribute
always_inline to force inlining (of inline functions).

I want to disable inlining of some functions (which I don't want inlined at
the expense of more useful ones), but still benefit from it with other
functions that I consider are called more frequently.

If 'always_inline' overrides -fno-inline-functions, that will do the trick.
But then your other link said there was a specific 'noinline' attribute,
which will be perfect if it works. Thanks!

If inlining is enabled, the compiler will decide whether or not to
inline on a per-callsite basis, not on a per-function (callee) basis. It
doesn't make much difference in your case iiuc. But keep in mind that
the call site is important. With a 300-case switch, it is likely that
the compiler will find your function too big and will refrain from
making it bigger.

Hence the need to control which ones *are* inlined. But I don't know why it
cares how big my function is, if I'm optimising for speed. Doubtless there
will be option for that somewhere too.

(Of course I can always just write the code inline anyway, but it's
difficult to manage and gets unwieldy at a source code level.)

Alain Ketterlin · Nov 24, 2012

BartC said:
I want to disable inlining of some functions (which I don't want inlined at
the expense of more useful ones), but still benefit from it with other
functions that I consider are called more frequently.

If 'always_inline' overrides -fno-inline-functions, that will do the trick.
But then your other link said there was a specific 'noinline'
attribute, which will be perfect if it works. Thanks!

Hmm, yes, gcc's options/attributes are many...

Check also the various debugging options, especially -dump-ipa-inline
(the objdump -l trick may be easier, though, if all you want is to check
whether calls are still there).

[...]

Hence the need to control which ones *are* inlined. But I don't know why it
cares how big my function is, if I'm optimising for speed.

It's really hard to control code size, i.e., avoid explosion. I guess
the compiler has to be somewhat conservative.

Doubtless there will be option for that somewhere too.

Gcc has parameters (that you can set with --param if you want) to set
various limits, e.g., large-function-growth and large-function-insns.

(Of course I can always just write the code inline anyway, but it's
difficult to manage and gets unwieldy at a source code level.)

Sure.

-- Alain.

Johann Klammer · Nov 24, 2012

BartC said:
OK. My experience of trying to extract useful information out of gcc
docs is that it's not very rewarding...

The problem is that most people try the manpage first, which is an ugly
unstructured lump of information. Try the info manual or the html
version instead. It is much better structured.

Jens Gustedt · Nov 24, 2012

Am 24.11.2012 18:13, schrieb Alain Ketterlin:

Hmm, yes, gcc's options/attributes are many...

Check also the various debugging options, especially -dump-ipa-inline
(the objdump -l trick may be easier, though, if all you want is to check
whether calls are still there).

or change to clang, that produces quite readable, annotated assembler
with the -S option, for which it is much easier to keep track of the
source lines, much better than gcc

Jens

Philip Lantz · Nov 25, 2012

BartC said:
Is there a way of controlling which functions are inlined, or likely to be
inlined, and which ones you never want inlined ... with gcc?

Yes. I use the following macros

#define ALWAYS_INLINE __attribute__((always_inline))
#define NOINLINE __attribute__((noinline))

James Dow Allen · Nov 25, 2012

Strange things are happening with performance, such as being 60%
slower, when, for example, a function's body is populated with code (they
start off empty), even though the function is never actually called.

The problem might be cache collisions, with two or more
pieces of executing code or data having the same low-order
address bits. The uncalled function is linked between two
called functions whose relative addresses change.
Two experiments to investigate this possibility:
1. Relocate the offending uncalled function: either
its position in a module, or the module's position in
load list.
2. Try *increasing* the amount of code in the uncalled function:
you'll eventually get back to a "sweet spot" where execution
is fast again.

Details may be complicated if your machine has two or more
interacting caches. And permanent fix may be difficult even
if you identify such cache collisions as the cause of slowdown.

James

Jorgen Grahn · Nov 25, 2012

I think BartC is responding to this by me, which he snipped:

static will make the compiler not emit code when all calls have been
inlined. It should not influence the inlining decision.

Why not? Just the emit/not emit choice should influence the inlining
decision, because this in turn influences code size.

Let's say I have this in a translation unit:

int foo(void) { /* pages of code */ }
int bar(void) {
int n = foo();
/* other stuff */
}

I'd expect a compiler to have a conservative inlining mode which
doesn't inline foo() in this case, to avoid code bloat.
(Especially since you're almost certain to have foo() calls in other
translation units, or you would have made it static.)

/Jorgen

Alain Ketterlin · Nov 25, 2012

[...]

Why not?

You're right, my phrasing was misleading. What I meant was: static will
not *by itself* determine whether a function is always inlined or not.

By the way, static should always be used when it applies, because it
also lets the compiler emit non strictly ABI-compliant code, which may
be faster.

-- Alain.

BartC · Nov 26, 2012

Perhaps what you are seeing is that all the inlining is increasing
the size of your inner loop so much that it is exceeding the L1
instruction cache, and thus you are seeing a slow-down.

I know little about instruction caches. I wouldn't have thought all these
functions together were that big (the whole program is only 120K and might
be double that when finished). Would it help if the most common functions
(or perhaps the smallest), were together? (But I don't even know if gcc
reorders my functions anyway.)

A couple of other ideas spring to mind for the program. One is to
re-structure with a table of function pointers rather than a huge switch -
it might also make the program clearer and easier to work with.

I'm testing three approaches: a giant switch statement, a table of label
pointers (specific to gcc), and a table of function pointers. Some timings
(for a set of combined benchmarks) are:

Switch: 75 seconds (74 with switch range-check disabled)
Label ptrs: 66 seconds
Function ptrs: 69 seconds

When I temporarily bring back the 350 (non-static) functions into the same
file, and relying on whatever default inlining gcc decides, I get these
results:

Switch: 69 seconds
Label ptrs: 57 seconds
Function ptrs: 70 seconds

With both Switch and Label pointers, there is a specific call in the source
to each of the 350 functions, and inlining is possible. With function
pointers, there is just one call, so inlining can't happen. So that's a big
disadvantage. (While the sets of switch and label calls are anyway generated
automatically.)

(For comparison, an older version of this interpreter, with a large ASM
component, benchmarked at 39 seconds. However the source code is a mess.

The C version has a bit of catching up to do yet. However it's currently
still better, typically, than the same programs executed under Perl, Python
or Ruby. Although I cheat by a bit by not having auto-ranging integer
arithmetic in the C version; it's too big an overhead.)

inline functions not inlined	9	Mar 6, 2006
inline vs. function pointers	36	Jan 27, 2011
inline functions	2	Aug 26, 2008
inline + va_list	4	Jun 28, 2008
Inline functions and linkage	5	May 26, 2009
about inline functions	14	Apr 25, 2009
Inline Functions?	3	Aug 19, 2008
static inline functions and gcc	21	May 21, 2009

inline functions

BartC

SG

BartC

Joost Kraaijeveld

Jorgen Grahn

Malcolm McLean

Alain Ketterlin

BartC

James Kuyper

James Kuyper

Alain Ketterlin

BartC

Alain Ketterlin

Johann Klammer

Jens Gustedt

Philip Lantz

James Dow Allen

Jorgen Grahn

Alain Ketterlin

BartC

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads