inline functions

Discussion in 'C Programming' started by BartC, Nov 24, 2012.

  1. BartC

    BartC Guest

    Is there a way of controlling which functions are inlined, or likely to be
    inlined, and which ones you never want inlined? Especially with gcc...

    (I have a file with over 300 functions, each of which is called at most
    once. Strange things are happening with performance, such as being 60%
    slower, when, for example, a function's body is populated with code (they
    start off empty), even though the function is never actually called.

    I assume this is due to selective inlining. At the moment I've had to
    segregate the functions from the calls, to avoid undue influence on
    performance, but want to benefit from inlining later on, without having to
    re-integrate the functions one by one.)

    --
    Bartc
     
    BartC, Nov 24, 2012
    #1
    1. Advertising

  2. BartC

    SG Guest

    Am 24.11.2012 13:20, schrieb BartC:
    > Is there a way of controlling which functions are inlined, or likely to
    > be inlined, and which ones you never want inlined? Especially with gcc...


    Yes, I think there is. RTFM.

    Cheers!
    SG
     
    SG, Nov 24, 2012
    #2
    1. Advertising

  3. BartC

    BartC Guest

    "SG" <> wrote in message
    news:k8qje7$je1$...
    > Am 24.11.2012 13:20, schrieb BartC:
    >> Is there a way of controlling which functions are inlined, or likely to
    >> be inlined, and which ones you never want inlined? Especially with gcc...

    >
    > Yes, I think there is. RTFM.


    OK. My experience of trying to extract useful information out of gcc docs is
    that it's not very rewarding...

    --
    Bartc
     
    BartC, Nov 24, 2012
    #3
  4. On 24/11/12 15:48, BartC wrote:
    >
    >
    > "SG" <> wrote in message
    > news:k8qje7$je1$...
    >> Am 24.11.2012 13:20, schrieb BartC:
    >>> Is there a way of controlling which functions are inlined, or likely to
    >>> be inlined, and which ones you never want inlined? Especially with
    >>> gcc...

    >>
    >> Yes, I think there is. RTFM.

    >
    > OK. My experience of trying to extract useful information out of gcc
    > docs is that it's not very rewarding...
    >

    Because a just "RTFM" is not really helpful to begin with, and rude in
    the worst case:

    google "force inlining gcc", first entry:

    http://stackoverflow.com/questions/8381293/how-do-i-force-gcc-to-inline-a-function
     
    Joost Kraaijeveld, Nov 24, 2012
    #4
  5. BartC

    Jorgen Grahn Guest

    On Sat, 2012-11-24, BartC wrote:
    > Is there a way of controlling which functions are inlined, or likely to be
    > inlined, and which ones you never want inlined? Especially with gcc...
    >
    > (I have a file with over 300 functions, each of which is called at most
    > once. Strange things are happening with performance, such as being 60%
    > slower, when, for example, a function's body is populated with code (they
    > start off empty), even though the function is never actually called.


    Care to provide a complete example? This seems highly unlikely -- a
    function which is never called will simply be discarded by the compiler.
    Worst case it will be sitting there doing nothing, perhaps making the
    CPU's instruction cache work a little bit less well.

    > I assume this is due to selective inlining. At the moment I've had to
    > segregate the functions from the calls, to avoid undue influence on
    > performance, but want to benefit from inlining later on, without having to
    > re-integrate the functions one by one.)


    I don't understand this. How can this be because of inlining, and yet
    you don't have inlining?

    More generally, my own rules of thumb:
    - things I'm quite sure will be useful to inline, and which
    are called in several translation units:
    -> 'static inline' in some header file
    - anything else:
    -> 'static', and trust the compiler to choose wisely
    - play with the optimization level (-O2, -O3, -Os in gcc)
    - look at the object code if I want to know the outcome

    - in the unlikely case that I need more performance /and/ believe
    this is a worthwhile area to investigate, I'd read the GCC manual.
    I know it covers this.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
     
    Jorgen Grahn, Nov 24, 2012
    #5
  6. You can take the address of a function. Call it somewhere indirectly using a bit of logic the compiler can't optimise out, eg

    int (*foo)(int) = notinlined;
    unsigned char buff[ sizeof(foo) ];
    memcpy(buff, &foo, sizeof(buff)];
    x = sqrt((int)buff[0] * buff[0]));
    if(x != floor(x))
    printf("%d\n", (*foo)(0));

    it's a bit hacky. The compiler won't be clever enough to know that the square
    root of a square is always an integer. Less drastic measures will work on
    most compilers.
     
    Malcolm McLean, Nov 24, 2012
    #6
  7. "BartC" <> writes:

    > "SG" <> wrote in message
    > news:k8qje7$je1$...
    >> Am 24.11.2012 13:20, schrieb BartC:
    >>> Is there a way of controlling which functions are inlined, or likely to
    >>> be inlined, and which ones you never want inlined? Especially with gcc...

    >>
    >> Yes, I think there is. RTFM.

    >
    > OK. My experience of trying to extract useful information out of gcc
    > docs is that it's not very rewarding...


    Function attributes are described here:

    http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html#Function-Attributes

    Command line options are here:

    http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Optimize-Options.html#Optimize-Options

    Search for "inline". Be careful if you need portable code.

    -- Alain.
     
    Alain Ketterlin, Nov 24, 2012
    #7
  8. BartC

    BartC Guest

    "Jorgen Grahn" <> wrote in message
    news:...
    > On Sat, 2012-11-24, BartC wrote:
    >> Is there a way of controlling which functions are inlined, or likely to
    >> be
    >> inlined, and which ones you never want inlined? Especially with gcc...
    >>
    >> (I have a file with over 300 functions, each of which is called at most
    >> once. Strange things are happening with performance, such as being 60%
    >> slower, when, for example, a function's body is populated with code (they
    >> start off empty), even though the function is never actually called.

    >
    > Care to provide a complete example? This seems highly unlikely -- a
    > function which is never called will simply be discarded by the compiler.


    The module is an interpreter dispatch loop. I'm experimenting with several;
    this one has one function containing a switch statement with 300 cases. Each
    case calls a corresponding function in the rest of the module.

    So a call to each function exists in the source, but is never called in
    practice (at least, not in the specific bytecode program being executed that
    exhibited the slowdown).

    The compiler has no way of knowing which of the 300 cases will be called (it
    depends on what bytecode is being executed), nor how often. I, on the other
    hand, have a much better idea! So I don't need rarely executed handlers
    inlined for example.

    There are various ways to tackle this, but it would be most convenient if
    there was an attribute 'never_inline' (I think there is one to always
    inline).

    >> I assume this is due to selective inlining. At the moment I've had to
    >> segregate the functions from the calls, to avoid undue influence on
    >> performance, but want to benefit from inlining later on, without having
    >> to
    >> re-integrate the functions one by one.)

    >
    > I don't understand this. How can this be because of inlining, and yet
    > you don't have inlining?


    Inlining must be happening at a result of optimisation. I haven't yet
    explicitly told it not to (except by moving all the functions to another
    file).

    > More generally, my own rules of thumb:
    > - things I'm quite sure will be useful to inline, and which
    > are called in several translation units:
    > -> 'static inline' in some header file


    I use 'static' on all of the functions (because they are not exported).
    Perhaps I should look at that...

    > - look at the object code if I want to know the outcome


    (That's another thing: I've tried several incantations to get gcc to produce
    assembler in a .s file, that contains some cross-references to the original
    C
    source. Haven't managed it yet..)

    --
    Bartc
     
    BartC, Nov 24, 2012
    #8
  9. BartC

    James Kuyper Guest

    On 11/24/2012 07:20 AM, BartC wrote:
    > Is there a way of controlling which functions are inlined, or likely to be
    > inlined, and which ones you never want inlined?


    The 'inline' keyword is only a hint; it doesn't guarantee anything. The
    only way to be sure a function won't be inlined is to define it in a
    different translation unit. There's no way to be sure it will be inlined
    - even if you inline it yourself, manually, a sufficiently clever
    compiler might reach the conclusion that that the common code used in
    several different places in your program should be extracted into a
    separate function.

    > ... Especially with gcc...


    gcc has dozens of options that contain the word "inline". I had planned
    to read the description of each option, and recommend the one(s) that
    seemed most relevant to your question. However, when I realized that
    there were so many to choose from, I decided to let you do the reading.
    --
    James Kuyper
     
    James Kuyper, Nov 24, 2012
    #9
  10. BartC

    James Kuyper Guest

    On 11/24/2012 10:20 AM, Malcolm McLean wrote:
    > You can take the address of a function. Call it somewhere indirectly using a bit of logic the compiler can't optimise out, eg
    >
    > int (*foo)(int) = notinlined;
    > unsigned char buff[ sizeof(foo) ];
    > memcpy(buff, &foo, sizeof(buff)];
    > x = sqrt((int)buff[0] * buff[0]));
    > if(x != floor(x))
    > printf("%d\n", (*foo)(0));
    >
    > it's a bit hacky. The compiler won't be clever enough to know that the square
    > root of a square is always an integer. Less drastic measures will work on
    > most compilers.


    In the long run, you're better off looking for ways (usually
    implementation-specific) to command the compiler to do what you want,
    than trying to trick it into doing so. If you can figure it out, a
    sufficiently clever compiler will figure it out. The only guaranteed
    effect is to confuse other humans who need to read the tricky code.
    --
    James Kuyper
     
    James Kuyper, Nov 24, 2012
    #10
  11. "BartC" <> writes:

    > "Jorgen Grahn" <> wrote in message
    > news:...
    >> On Sat, 2012-11-24, BartC wrote:
    >>> Is there a way of controlling which functions are inlined, or likely to
    >>> be
    >>> inlined, and which ones you never want inlined? Especially with gcc...
    >>>
    >>> (I have a file with over 300 functions, each of which is called at most
    >>> once. Strange things are happening with performance, such as being 60%
    >>> slower, when, for example, a function's body is populated with code (they
    >>> start off empty), even though the function is never actually called.

    >>
    >> Care to provide a complete example? This seems highly unlikely -- a
    >> function which is never called will simply be discarded by the compiler.

    >
    > The module is an interpreter dispatch loop. I'm experimenting with several;
    > this one has one function containing a switch statement with 300 cases. Each
    > case calls a corresponding function in the rest of the module.
    >
    > So a call to each function exists in the source, but is never called in
    > practice (at least, not in the specific bytecode program being executed that
    > exhibited the slowdown).
    >
    > The compiler has no way of knowing which of the 300 cases will be called (it
    > depends on what bytecode is being executed), nor how often. I, on the other
    > hand, have a much better idea! So I don't need rarely executed handlers
    > inlined for example.
    >
    > There are various ways to tackle this, but it would be most convenient if
    > there was an attribute 'never_inline' (I think there is one to always
    > inline).


    It goes the other way round: there is a command line option
    (-fno-inline-functions) to prevent inlining, and an attribute
    always_inline to force inlining (of inline functions).

    >>> I assume this is due to selective inlining. At the moment I've had to
    >>> segregate the functions from the calls, to avoid undue influence on
    >>> performance, but want to benefit from inlining later on, without having
    >>> to
    >>> re-integrate the functions one by one.)

    >>
    >> I don't understand this. How can this be because of inlining, and yet
    >> you don't have inlining?

    >
    > Inlining must be happening at a result of optimisation. I haven't yet
    > explicitly told it not to (except by moving all the functions to another
    > file).


    If inlining is enabled, the compiler will decide whether or not to
    inline on a per-callsite basis, not on a per-function (callee) basis. It
    doesn't make much difference in your case iiuc. But keep in mind that
    the call site is important. With a 300-case switch, it is likely that
    the compiler will find your function too big and will refrain from
    making it bigger.

    >> More generally, my own rules of thumb:
    >> - things I'm quite sure will be useful to inline, and which
    >> are called in several translation units:
    >> -> 'static inline' in some header file

    >
    > I use 'static' on all of the functions (because they are not exported).
    > Perhaps I should look at that...


    static will make the compiler not emit code when all calls have been
    inlined. It should not influence the inlining decision.

    >> - look at the object code if I want to know the outcome

    >
    > (That's another thing: I've tried several incantations to get gcc to
    > produce assembler in a .s file, that contains some cross-references to
    > the original C source. Haven't managed it yet..)


    objdump -d -l will print file:line on an image containing debug info.

    -- Alain.
     
    Alain Ketterlin, Nov 24, 2012
    #11
  12. BartC

    BartC Guest

    "Alain Ketterlin" <-strasbg.fr> wrote in message
    news:-strasbg.fr...
    > "BartC" <> writes:


    >> There are various ways to tackle this, but it would be most convenient if
    >> there was an attribute 'never_inline' (I think there is one to always
    >> inline).

    >
    > It goes the other way round: there is a command line option
    > (-fno-inline-functions) to prevent inlining, and an attribute
    > always_inline to force inlining (of inline functions).


    I want to disable inlining of some functions (which I don't want inlined at
    the expense of more useful ones), but still benefit from it with other
    functions that I consider are called more frequently.

    If 'always_inline' overrides -fno-inline-functions, that will do the trick.
    But then your other link said there was a specific 'noinline' attribute,
    which will be perfect if it works. Thanks!

    >> Inlining must be happening at a result of optimisation. I haven't yet
    >> explicitly told it not to (except by moving all the functions to another
    >> file).

    >
    > If inlining is enabled, the compiler will decide whether or not to
    > inline on a per-callsite basis, not on a per-function (callee) basis. It
    > doesn't make much difference in your case iiuc. But keep in mind that
    > the call site is important. With a 300-case switch, it is likely that
    > the compiler will find your function too big and will refrain from
    > making it bigger.


    Hence the need to control which ones *are* inlined. But I don't know why it
    cares how big my function is, if I'm optimising for speed. Doubtless there
    will be option for that somewhere too.

    (Of course I can always just write the code inline anyway, but it's
    difficult to manage and gets unwieldy at a source code level.)

    --
    Bartc
     
    BartC, Nov 24, 2012
    #12
  13. "BartC" <> writes:

    >>> There are various ways to tackle this, but it would be most convenient if
    >>> there was an attribute 'never_inline' (I think there is one to always
    >>> inline).

    >>
    >> It goes the other way round: there is a command line option
    >> (-fno-inline-functions) to prevent inlining, and an attribute
    >> always_inline to force inlining (of inline functions).

    >
    > I want to disable inlining of some functions (which I don't want inlined at
    > the expense of more useful ones), but still benefit from it with other
    > functions that I consider are called more frequently.
    >
    > If 'always_inline' overrides -fno-inline-functions, that will do the trick.
    > But then your other link said there was a specific 'noinline'
    > attribute, which will be perfect if it works. Thanks!


    Hmm, yes, gcc's options/attributes are many...

    Check also the various debugging options, especially -dump-ipa-inline
    (the objdump -l trick may be easier, though, if all you want is to check
    whether calls are still there).

    [...]
    > Hence the need to control which ones *are* inlined. But I don't know why it
    > cares how big my function is, if I'm optimising for speed.


    It's really hard to control code size, i.e., avoid explosion. I guess
    the compiler has to be somewhat conservative.

    > Doubtless there will be option for that somewhere too.


    Gcc has parameters (that you can set with --param if you want) to set
    various limits, e.g., large-function-growth and large-function-insns.

    > (Of course I can always just write the code inline anyway, but it's
    > difficult to manage and gets unwieldy at a source code level.)


    Sure.

    -- Alain.
     
    Alain Ketterlin, Nov 24, 2012
    #13
  14. BartC wrote:
    >
    >
    > "SG" <> wrote in message
    > news:k8qje7$je1$...
    >> Am 24.11.2012 13:20, schrieb BartC:
    >>> Is there a way of controlling which functions are inlined, or likely to
    >>> be inlined, and which ones you never want inlined? Especially with
    >>> gcc...

    >>
    >> Yes, I think there is. RTFM.

    >
    > OK. My experience of trying to extract useful information out of gcc
    > docs is that it's not very rewarding...
    >


    The problem is that most people try the manpage first, which is an ugly
    unstructured lump of information. Try the info manual or the html
    version instead. It is much better structured.
     
    Johann Klammer, Nov 24, 2012
    #14
  15. BartC

    Jens Gustedt Guest

    Am 24.11.2012 18:13, schrieb Alain Ketterlin:
    > "BartC" <> writes:
    >
    >>>> There are various ways to tackle this, but it would be most convenient if
    >>>> there was an attribute 'never_inline' (I think there is one to always
    >>>> inline).
    >>>
    >>> It goes the other way round: there is a command line option
    >>> (-fno-inline-functions) to prevent inlining, and an attribute
    >>> always_inline to force inlining (of inline functions).

    >>
    >> I want to disable inlining of some functions (which I don't want inlined at
    >> the expense of more useful ones), but still benefit from it with other
    >> functions that I consider are called more frequently.
    >>
    >> If 'always_inline' overrides -fno-inline-functions, that will do the trick.
    >> But then your other link said there was a specific 'noinline'
    >> attribute, which will be perfect if it works. Thanks!

    >
    > Hmm, yes, gcc's options/attributes are many...
    >
    > Check also the various debugging options, especially -dump-ipa-inline
    > (the objdump -l trick may be easier, though, if all you want is to check
    > whether calls are still there).


    or change to clang, that produces quite readable, annotated assembler
    with the -S option, for which it is much easier to keep track of the
    source lines, much better than gcc

    Jens
     
    Jens Gustedt, Nov 24, 2012
    #15
  16. BartC

    Philip Lantz Guest

    BartC wrote:
    >
    > Is there a way of controlling which functions are inlined, or likely to be
    > inlined, and which ones you never want inlined ... with gcc?


    Yes. I use the following macros

    #define ALWAYS_INLINE __attribute__((always_inline))
    #define NOINLINE __attribute__((noinline))
     
    Philip Lantz, Nov 25, 2012
    #16
  17. On Nov 24, 7:21 pm, "BartC" <> wrote:
    > Strange things are happening with performance, such as being 60%
    > slower, when, for example, a function's body is populated with code (they
    > start off empty), even though the function is never actually called.


    The problem might be cache collisions, with two or more
    pieces of executing code or data having the same low-order
    address bits. The uncalled function is linked between two
    called functions whose relative addresses change.
    Two experiments to investigate this possibility:
    1. Relocate the offending uncalled function: either
    its position in a module, or the module's position in
    load list.
    2. Try *increasing* the amount of code in the uncalled function:
    you'll eventually get back to a "sweet spot" where execution
    is fast again.

    Details may be complicated if your machine has two or more
    interacting caches. And permanent fix may be difficult even
    if you identify such cache collisions as the cause of slowdown.

    James
     
    James Dow Allen, Nov 25, 2012
    #17
  18. BartC

    Jorgen Grahn Guest

    On Sat, 2012-11-24, Alain Ketterlin wrote:
    > "BartC" <> writes:
    >
    >> "Jorgen Grahn" <> wrote in message
    >> news:...

    ....
    >>> More generally, my own rules of thumb:
    >>> - things I'm quite sure will be useful to inline, and which
    >>> are called in several translation units:
    >>> -> 'static inline' in some header file


    I think BartC is responding to this by me, which he snipped:

    >>> - anything else:
    >>> -> 'static', and trust the compiler to choose wisely


    >> I use 'static' on all of the functions (because they are not exported).
    >> Perhaps I should look at that...

    >
    > static will make the compiler not emit code when all calls have been
    > inlined. It should not influence the inlining decision.


    Why not? Just the emit/not emit choice should influence the inlining
    decision, because this in turn influences code size.

    Let's say I have this in a translation unit:

    int foo(void) { /* pages of code */ }
    int bar(void) {
    int n = foo();
    /* other stuff */
    }

    I'd expect a compiler to have a conservative inlining mode which
    doesn't inline foo() in this case, to avoid code bloat.
    (Especially since you're almost certain to have foo() calls in other
    translation units, or you would have made it static.)

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
     
    Jorgen Grahn, Nov 25, 2012
    #18
  19. Jorgen Grahn <> writes:

    [...]
    >> static will make the compiler not emit code when all calls have been
    >> inlined. It should not influence the inlining decision.

    >
    > Why not?


    You're right, my phrasing was misleading. What I meant was: static will
    not *by itself* determine whether a function is always inlined or not.

    By the way, static should always be used when it applies, because it
    also lets the compiler emit non strictly ABI-compliant code, which may
    be faster.

    -- Alain.

    > Just the emit/not emit choice should influence the inlining decision,
    > because this in turn influences code size.
    >
    > Let's say I have this in a translation unit:
    >
    > int foo(void) { /* pages of code */ }
    > int bar(void) {
    > int n = foo();
    > /* other stuff */
    > }
    >
    > I'd expect a compiler to have a conservative inlining mode which
    > doesn't inline foo() in this case, to avoid code bloat.
    > (Especially since you're almost certain to have foo() calls in other
    > translation units, or you would have made it static.)
    >
    > /Jorgen
     
    Alain Ketterlin, Nov 25, 2012
    #19
  20. BartC

    BartC Guest

    "David Brown" <> wrote in message
    news:...
    > On 24/11/2012 16:33, BartC wrote:
    >
    >> There are various ways to tackle this, but it would be most convenient if
    >> there was an attribute 'never_inline' (I think there is one to always
    >> inline).


    > Perhaps what you are seeing is that all the inlining is increasing
    > the size of your inner loop so much that it is exceeding the L1
    > instruction cache, and thus you are seeing a slow-down.


    I know little about instruction caches. I wouldn't have thought all these
    functions together were that big (the whole program is only 120K and might
    be double that when finished). Would it help if the most common functions
    (or perhaps the smallest), were together? (But I don't even know if gcc
    reorders my functions anyway.)

    > A couple of other ideas spring to mind for the program. One is to
    > re-structure with a table of function pointers rather than a huge switch -
    > it might also make the program clearer and easier to work with.


    I'm testing three approaches: a giant switch statement, a table of label
    pointers (specific to gcc), and a table of function pointers. Some timings
    (for a set of combined benchmarks) are:

    Switch: 75 seconds (74 with switch range-check disabled)
    Label ptrs: 66 seconds
    Function ptrs: 69 seconds

    When I temporarily bring back the 350 (non-static) functions into the same
    file, and relying on whatever default inlining gcc decides, I get these
    results:

    Switch: 69 seconds
    Label ptrs: 57 seconds
    Function ptrs: 70 seconds

    With both Switch and Label pointers, there is a specific call in the source
    to each of the 350 functions, and inlining is possible. With function
    pointers, there is just one call, so inlining can't happen. So that's a big
    disadvantage. (While the sets of switch and label calls are anyway generated
    automatically.)

    (For comparison, an older version of this interpreter, with a large ASM
    component, benchmarked at 39 seconds. However the source code is a mess.

    The C version has a bit of catching up to do yet. However it's currently
    still better, typically, than the same programs executed under Perl, Python
    or Ruby. Although I cheat by a bit by not having auto-ranging integer
    arithmetic in the C version; it's too big an overhead.)


    --
    Bartc
     
    BartC, Nov 26, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Abhi
    Replies:
    2
    Views:
    759
    E. Robert Tisdale
    Jul 3, 2003
  2. Replies:
    3
    Views:
    483
  3. Daniel Vallstrom
    Replies:
    2
    Views:
    2,008
    Kevin Bracey
    Nov 21, 2003
  4. jamihuq

    converting inline functions to C functions

    jamihuq, May 16, 2006, in forum: C Programming
    Replies:
    7
    Views:
    362
  5. Rahul
    Replies:
    3
    Views:
    474
    James Kanze
    Feb 28, 2008
Loading...

Share This Page