What is the gain of "inline"

B

BGB / cr88192

Eric Sosman said:
[...]
inline is mostly useful for "one-liners", where the overhead of the call
will be larger than that of the code in the function...

but, for code much larger than this (such as code with loops or
conditionals
and stuff), the use of inline is ill-advised... (since IME, the cost of a
function call is not usually THAT much higher than that of a loop
iteration,
for example...).

An effect I've not seen mentioned here is that expanding
a function body in-line may allow some parts to be "executed"
at compile time.

inline void foo(int which) {
switch (which) {
case 0: code-for-0; break;
case 1: code-for-1; break;
...
}
}

An in-line call to foo(1) might turn into code-for-1 all by
itself, with all the other cases -- the whole switch, in
fact -- removed as dead code. Not only do you eliminate the
time to get to the function and back again, but you may also
eliminate or simplify some of the code inside the function.

possibly the case, but I am not sure this is likely to be as
significant/frequent in practice...

 
B

BGB / cr88192

pete said:
It's for people who don't like macros.

I use both, but which is used depends more on the specific behaviors...

a notable weak point of using macros is that they may end up evaluating the
arguments multiple times, which can either break or foul up some use cases,
as well as slowing down others.

so, there are reasons one might want the semantics of avoiding multiple
evaluations, while still having an overhead similar to that of a macro.

 
B

BGB / cr88192

Kaz Kylheku said:
Remember, boys and girls, this is from somenoe who thinks that the
stack-blowing idiocy known as variable length arrays is a good idea!

I don't personally implement VLAs, mostly since I don't use them, and
secondly because they would be a problem to implement with my current
compiler (if they are to be, in fact, located on the stack).

part of the reason for this is that my compiler generally keeps track of
where everything is on the stack, and so depends on having the stack layout
fixed at compile time. variable-length objects would pose a problem in that
one can no longer statically calculate all their stack offsets.

Smart use of inline speeds up programs considerably. Some small
functinos can be replaced by an instruction sequence which is as short
as the function call.

I work on GNU/Linux running on MIPS. In userland, function calls are
gross. They have to ensure that the $gp register has the correct value,
load some offsets from the global offset table and then do an indirect
branch through the $t9 register.

For instance, the puts call in this:

#include <stdio.h>

int main(void)
{
puts("hello");
return 0;
}

turns into this:


Fetch the global pointer:

lui $28,%hi(%neg(%gp_rel(main)))
addu $28,$28,$25
addiu $28,$28,%lo(%neg(%gp_rel(main)))

Now go into the global offset table to figure
out where puts is, and begin the calculation
of where the string literal "hello" is:

lw $4,%got_page(.LC0)($28)
lw $25,%call16(puts)($28)

Save our caller's return address.

sd $31,8($sp)

Finally do the call.

jal $25

But not quite; in the branch delay slot, complete calculating the address
of
the string literal:

addiu $4,$4,%got_ofst(.LC0)

Phew! It's definitely worth inlining a function that can be done in a few
instructions!

it is not so good on x86-64 either, since a function call may involve:
having to spill any values in any caller-save/scratch registers;
having to get arguments into the correct registers (this part itself a
little "painful" with SysV);
doing the call;
maybe having to reserve stack space and spill the arguments (almost
invariable with any non-leaf functions, given they are passed in scratch
registers);
....


So does a function-like macro. Only the inline function is type safe.


Can you put a number on this, like 75.3% of the cases?

What sampling method is is used, over what kind of data to arrive at the
statistic?

yeah, I think he is thinking most inline functions are large...


this is maybe about the same as me thinking that most functions are non-leaf
functions...
but, then again, I have written enough code to be almost certain that this
is the case:
a rare minority of code is leaf functions;
very little is to say that it is the leaf functions which will be eating the
running time.

What if the inline code is bloated, but it's in a tight loop that fits
nicely into the cache?


No it isn't; see MIPS code above.

yeah.
depends on the arch...
personally I suspect that function calls are cheaper on x86 than on x86-64
(SysV and Win64), since there are so few registers that there is not nearly
so much worry about the cost of spills.

Typically, shared libraries always use indirect jumps.

yeah, or at least with ELF and friends...

with PE/COFF, only non-local calls are indirect (or at least on x86 and
x86-64, I know little about MIPS...).

See use of branch delay slot in MIPS code; something can be put into the
pipeline even though a branch is happening. (Though this is now part
of the instruction set architecture and behaves the same way regardless
of whether there actually is a branch delay slot, or how large it is;
if the hardware implementation has a two cycle stall in the pipeline for a
branch, you still get just one slot to fill).

ok.
 
S

Stephen Sprunk

jacob said:
Ben Bacarisse a écrit :

Inline will be a bad choice in general when then size of the inlined
function's body is bigger than the calling sequence of the normal call.
I.e. when there is an increase in the size of the code.

.... unless the particular arguments being passed to the inline function
allow the compiler to eliminate most of the function body as dead code.
What is important to notice in this context is that the speed of
RAM is VERY slow compared to the speed of the in-cache RAM and
the speed of the processor. The more code you get into the code cache
the faster your program will run.

All of that is true, but the function body has to be loaded into cache
whether or not it's inlined. The _only_ time this matters, then, is if
the parent function calls the leaf function multiple times, each call is
inlined, and there are not enough dead code or common code eliminations
to make up for the increased code size.

S
 
I

Ian Collins

jacob said:
Ian Collins a écrit :

If you use even an old CPU, it is highly unlikely that will be
less than 1GHZ. RAM in embedded devices is way slower than
workstations RAM too, so the ration should stay the same or even worst
than workstations.

Well the fastest of our layer 3 switching products uses 800MHz PowerPC,
most use 500 or 666MHz parts. They also use standard PC DDR2 RAM. So
wrong on both counts.
 
I

Ian Collins

jacob said:
Kaz Kylheku a écrit :

Sure, in this cases inline can be beneficial, but this cases
tend to be used in C++ more than in C.

Why? Some of us prefer small functions in C as well. Yes the style
used to be more common in C++, but modern C compilers have caught up
with C++ compiler's ability to automatically in line appropriate functions.

Do remember the inline keyword is little more than a waste of pixels
these days.
So what?

The point is if the code gets bigger or not. That is the main criteria.

On some platforms that may gain you speed, on other it may not.
 
N

Nick

jacob navia said:
Kaz Kylheku a écrit :
I think that VLAs are a good construct proposed by the C standard.
I have (in comp.std.c++) argued that it decreases the stack usage
since you use only what you need and to some arbitrary maximum size.

VLAs don't have to be done on the stack do they? Surely a compiler
could replace:

int bla(size_t sz) {
char p[sz];
/* do lots of things on p - reading that many characters from a file
and hashing them, say */
return result;
}

with

int bla(size_t sz) {
char *p;
p = malloc_with_somthing_on_failure(sz);
/* do lots of things on p - reading that many characters from a file
and hashing them, say */
free(p);
return result;
}

?
 
N

Nick Keighley

Antoninus Twink a écrit :



Hypocrite
–noun
1.      a person who pretends to have virtues, moral or religious beliefs,
principles, etc., that he or she does not actually possess, esp. a
person whose actions belie stated beliefs.

2.      a person who feigns some desirable or publicly approved attitude,
esp. one whose private life, opinions, or statements belie his or her
public statements.

interesting. The term is often misused then. The charge of hypocrisy
is often levelled at people who offer advice that they themselves
don't follow whilst the actual meaning is more subtle.
 
G

gwowen

C++ mandates inline in many contexts, and in the company where
I work this bloats the code enormously.

Really? I can only think of one place where C++ mandates inline (in
the absence of an explicit "inline" and thats when the definition of a
member function is included in the class definition. What are the
others?
Hypocrite
–noun
1.   [Definitions elided].

Gee. Thats interesting. Would you apply that definition to someone
who excoriates others for making personal attacks, and then does the
same themselves?
 
T

TonyMc

Kaz Kylheku said:
Smart use of inline speeds up programs considerably. Some small
functinos can be replaced by an instruction sequence which is as short
as the function call.

I often think Kaz's posts are unnecessarily abrasive, but for coining
the word functino to mean "small function" I can forgive a great deal.
I do hope that word becomes accidentally adopted.

Tony
 
S

Stefan Ram

BTW: In C++, »inline« has a different meaning than in C
(according to ISO/IEC 14882:2003(E) versus ISO/IEC 9899:1999 (E)).
 
S

Stefan Ram

gwowen said:
Really? I can only think of one place where C++ mandates inline (in

BTW: In C++, »inline« has a different meaning than in C
(according to ISO/IEC 14882:2003(E) versus ISO/IEC 9899:1999 (E)).
 
G

gwowen

I can only think of one place where C++ mandates inline (in
the absence of an explicit "inline" and thats when the definition of a
member function is included in the class definition.  What are the
others?

Reference to r7.1.2 [dcl.fct.spec] in the C++ standard confirms this.
A function in C++ is never mandated to be inline. The keyword is only
ever a hint, but member functions defined with a class definition are
implicitly specified as if declared "inline".
 
E

Eric Sosman

Eric Sosman said:
An effect I've not seen mentioned here is that expanding
a function body in-line may allow some parts to be "executed"
at compile time.
[...]

possibly the case, but I am not sure this is likely to be as
significant/frequent in practice...

Settling the question (either way) would require a good
deal of research. Still, my un-researched impression is that
compilers are pretty aggressive optimizers nowadays, and that
an inlined function body might afford the optimizer more room
to maneuver than the "outlined" version. The compiler working
on an inlined call has information not available to a compiler
building code for a general case. And even if there are no
shortcuts from dead code elimination or from folding in of
known-to-be-constant arguments, there's the possibility of
"cross-talk" between the function body and the caller that
embeds it: Common sub-expression elimination, use of registers
to cache values used in both caller and callee, ...

The opportunities are certainly present. The extent to
which they're used is situational -- and is the hard part to
assess in a general way.
 
T

Tom St Denis

One additional effect of "inline" with the compiler that I use is
to suppress warnings about a function that is defined but never
used.  This is presumably because "static inline" functions are
often defined in header files and thus may not be used by every
translation unit that includes them, whereas it is usually an
oversight if an ordinary "static" function is never called.

Code doesn't go in header files.

That is all.

Tom
 
S

Stephen Sprunk

Nick said:
VLAs don't have to be done on the stack do they? Surely a compiler
could replace:

int bla(size_t sz) {
char p[sz];
/* do lots of things on p - reading that many characters from a file
and hashing them, say */
return result;
}

with

int bla(size_t sz) {
char *p;
p = malloc_with_somthing_on_failure(sz);
/* do lots of things on p - reading that many characters from a file
and hashing them, say */
free(p);
return result;
}

?

Such a simple example appears to comply with the as-if rule, but the
situation gets more complicated when there are multiple potential exits
from the function, including code paths not using "return".

The easiest way to get "automatic storage duration" is to put a VLA in
the same place as all other objects of "automatic storage duration",
i.e. the stack (on machines that have such). In a sense, VLAs are a
more portable way of providing alloca().

S
 
K

Kaz Kylheku

Kaz Kylheku a écrit :
I think that VLAs are a good construct proposed by the C standard.
I have (in comp.std.c++) argued that it decreases the stack usage
since you use only what you need and to some arbitrary maximum size.

Yeah, but you neglected to respond to the excellent counter-arguments to
that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,438
Messages
2,571,699
Members
48,796
Latest member
Greg L.
Top