What is the gain of "inline"

K

Keith Thompson

Stephen Sprunk said:
IIRC, GCC will automatically inline "small" functions, but "inline"
tells it to inline the function regardless of size. The size threshold
can be changed via command-line flag, but IMHO that's not as clean.

Also, I think GCC does not output a standalone function body if the
function is declared "static inline" but does if the function is merely
declared "static", even if every call to the latter gets inlined due to
heuristics.

So, "inline" is not (yet?) completely useless like "auto" or "register".

The "auto" keyword is completely useless (except perhaps as
documentation); it can only specify something that the compiler *must*
do in the absence of the keyword. It was useful in older versions of C,
where
auto x;
static y;
declared x and y as objects of type int (even there I'd prefer to
declare them as "int" explicitly, making the "auto" keyword
redundant).

The "register" keyword, on the other hand, is at least potentially
meaningful. The common wisdom is that the compiler can always do
a better job of register allocation than the programmer can, but
I'm not sure that's entirely true. It also has a semantic effect:
it forbids taking the address of the object. Even if the compiler
doesn't choose to store the object in a register, knowing that
its address is never computed might be helpful for optimization.
(Though since "register" can be applied only to objects with block
scope, a reasonably clever compiler should be able to determine
that for itself.)
 
E

Eric Sosman

[...]
inline is mostly useful for "one-liners", where the overhead of the call
will be larger than that of the code in the function...

but, for code much larger than this (such as code with loops or conditionals
and stuff), the use of inline is ill-advised... (since IME, the cost of a
function call is not usually THAT much higher than that of a loop iteration,
for example...).

An effect I've not seen mentioned here is that expanding
a function body in-line may allow some parts to be "executed"
at compile time.

inline void foo(int which) {
switch (which) {
case 0: code-for-0; break;
case 1: code-for-1; break;
...
}
}

An in-line call to foo(1) might turn into code-for-1 all by
itself, with all the other cases -- the whole switch, in
fact -- removed as dead code. Not only do you eliminate the
time to get to the function and back again, but you may also
eliminate or simplify some of the code inside the function.
 
S

Seebs

"inline" can't really suggest anything else but to inline the called
function. Otherwise you still have to setup a stack frame and hit the
cost of calling a function.

Not necessarily the case. There exist systems on which it is possible
for the cost of calling a function to be essentially zero if the function
meets certain criteria -- in which case, the compiler might choose to
tweak the implementation of the function to meet those criteria, then continue
emitting calls to it. If the calls are smaller than the function body,
but the call overhead is functionally zero, this may even IMPROVE performance
in some cases.

So it really is just "please be fast".
I'm saying that in most modern compilers you will get function
inlining whether you use the keyword or not. So it's largely
academic.
True.

It's like saying "auto" on stack variables...

Not really. That definitively changes nothing, where as specifying inline
may cause a compiler to try to inline (or otherwise speed up) a function
it otherwise might not have.

-s
 
B

Ben Pfaff

One additional effect of "inline" with the compiler that I use is
to suppress warnings about a function that is defined but never
used. This is presumably because "static inline" functions are
often defined in header files and thus may not be used by every
translation unit that includes them, whereas it is usually an
oversight if an ordinary "static" function is never called.
 
K

Kaz Kylheku

Could someone explain "inline" for me?

Making a function an inline function suggests that
calls to the function be as fast as possible.

Not quite. [...]

Ah, this is obviously some strange usage of the phrase "not
quite" that I wasn't previously aware of. See ISO/IEC 9899:1999,
Section 6.7.4, paragraph 5, third sentence.

CLC drone,

You might want to acquaint yourself with this idea of ``context'', which
has to do with why there is a first sentence, and a second sentence, and
possibly fourth one, and so on.

Something ripped out of context is, in fact, ``not quite'' that
thing in its proper context.
 
P

Phil Carmody

Well, it /is/ an issue (regarding speed) whether the code
will fit into the processor cache or not.

(Recently someone wrote about how fast RAM always is 32 KB,
it was, when he bought his Pet 2001 back in 1977 and it is
today, when it is being called »L1 Cache«.)

((When »fast RAM« is defined as RAM the processor can
access in his native speed.))

In order to fetch arbitrary memory, the 6502 had to spend twice as
long interfacing with the address bus than it did when accessing
zero page RAM. So the actual fast memory was only 256 bytes.

Phil
 
P

Phil Carmody

Kaz Kylheku said:
On Dec 9, 7:45 am, (e-mail address removed)-berlin.de (Stefan Ram) wrote:
Could someone explain "inline" for me?

Making a function an inline function suggests that
calls to the function be as fast as possible.

Not quite. [...]

Ah, this is obviously some strange usage of the phrase "not
quite" that I wasn't previously aware of. See ISO/IEC 9899:1999,
Section 6.7.4, paragraph 5, third sentence.

CLC drone,

Just because you do not like the fact that a group of people often
agree with each other, and often disagree with you does not make
them drones. If you are unable to enter into polite discourse with
those who show a preference for such, then perhaps you should
consider not partaking in any discourse at all.
You might want to acquaint yourself with this idea of ``context'', which
has to do with why there is a first sentence, and a second sentence, and
possibly fourth one, and so on.

Something ripped out of context is, in fact, ``not quite'' that
thing in its proper context.

And precisely which bit of the first, second, and fourth sentences
in any way modify the payload of the third sentence? If none, then
why are you wasting people's time drawing their attention to them?

Phil
 
A

Antoninus Twink

If you are unable to enter into polite discourse with those who show a
preference for such, then perhaps you should consider not partaking in
any discourse at all.

Oh. The. Irony.

Phil "the Psycho" Carmodey lecturing other people on politeness - I've
heard it all now.
 
P

Phil Carmody

Eric Sosman said:
[...]
inline is mostly useful for "one-liners", where the overhead of the call
will be larger than that of the code in the function...

but, for code much larger than this (such as code with loops or conditionals
and stuff), the use of inline is ill-advised... (since IME, the cost of a
function call is not usually THAT much higher than that of a loop iteration,
for example...).

An effect I've not seen mentioned here is that expanding
a function body in-line may allow some parts to be "executed"
at compile time.

inline void foo(int which) {
switch (which) {
case 0: code-for-0; break;
case 1: code-for-1; break;
...
}
}

An in-line call to foo(1) might turn into code-for-1 all by
itself, with all the other cases -- the whole switch, in
fact -- removed as dead code. Not only do you eliminate the
time to get to the function and back again, but you may also
eliminate or simplify some of the code inside the function.

A quick test (of an inline function more complicated than that,
but still with complile-time calculation of the return value
indicates that gcc 2.7.2 was doing this even back in 1995. I'm
pretty sure Borland was not able to do it in 1993, and wonder
when this intelligent inlining became mainstream. Anyone
got a 2.5, 2.4, or earlier, gcc to hand?

Phil
 
J

jacob navia

Ben Bacarisse a écrit :
Maybe you can give an example where inlining slows down the code. I'd
like to see what sort of code leads you to this conclusion.

Inline will be a bad choice in general when then size of the inlined function's body
is bigger than the calling sequence of the normal call. I.e. when there is an
increase in the size of the code.

What is important to notice in this context is that the speed of
RAM is VERY slow compared to the speed of the in-cache RAM and
the speed of the processor. The more code you get into the code cache
the faster your program will run.

Inlining, specially in C++ is a nightmare since it tends to be used
too much, and in situations where it shouldn't be used at all. C++
mandates inline in many contexts, and in the company where I work
this bloats the code enormously.

Obviously I can't send you the 40-50MB executable for you to
inspect it, sorry.

lcc-win optimizes only for code size, and using only this optimization,
speed has been greatly increased.

jacob
 
J

jacob navia

Kaz Kylheku a écrit :
Suppose I had a function like this in a shared library on MIPS:

int get_foo_count(struct bar *bar)
{
return bar->foo_count;
}

calling this will be almost certainly longer than just doing the access
directly with inline code (macro or inline function).
See, computing the address of a function in a shared library and calling
it is more complicated than accessing a structure member.

Sure, in this cases inline can be beneficial, but this cases
tend to be used in C++ more than in C.
Even if the function is not in a shared library, it may still be
more instructions to do the call. Suppose I add the call into a leaf
function. Now that function becomes a caller to a callee, and has
new responsibilities: namely saving all of the caller-saved registers!

So what?

The point is if the code gets bigger or not. That is the main criteria.
 
J

jacob navia

Kaz Kylheku a écrit :
Remember, boys and girls, this is from somenoe who thinks that the
stack-blowing idiocy known as variable length arrays is a good idea!
I think that VLAs are a good construct proposed by the C standard.
I have (in comp.std.c++) argued that it decreases the stack usage
since you use only what you need and to some arbitrary maximum size.

To call it "stack blowing" is the contarry of the effect of the VLA.

But that is another discussion.

What your example of that old RISC architecture proves is just
that in some circumstances inline can be useful.

My post was directed at provoking a discussion and showing an
alternative way of looking at some established "dogmas".

jacob
 
J

jacob navia

Ian Collins a écrit :
I don't have a figure for the ratio of embedded devices to desktops and
servers, but I'm sure it's high enough to invalidate that statement.

If you use even an old CPU, it is highly unlikely that will be
less than 1GHZ. RAM in embedded devices is way slower than
workstations RAM too, so the ration should stay the same or even worst
than workstations.
 
K

Keith Thompson

jacob navia said:
Ben Bacarisse a écrit :

Inline will be a bad choice in general when then size of the inlined
function's body is bigger than the calling sequence of the normal
call. I.e. when there is an increase in the size of the code.

That's a rather extreme claim. (Though the phrase "in general" is
ambiguous; it can mean either "usually" or "always". Which did you
mean?)

Here's a somewhat contrived counterexample. Suppose you have a tight
inner loop, executed bajillions of times, containing a function call.
If the inlined code for the function is smaller than the calling
sequence, then inlining almost certainly makes sense; it saves both
time and space.

But suppose inlining the function increases (say, doubles or triples)
the size of the code for the loop, but the entire loop *still* fits in
cache. In that case, I would think that inlining would improve speed
at the cost of some increase in code size.

[...]
lcc-win optimizes only for code size, and using only this optimization,
speed has been greatly increased.

Ok, but I wonder if you could get even better speed with a slightly
less lopsided optimization strategy.
 
J

jacob navia

kyle york a écrit :
Really? I program PIC Microcontrollers where the bus speed is usually
around 4MHz. I'd suggest in the average house embedded devices such as
these far outnumber the system you describe.

Sure, and in a PIC with 128 bytes of RAM you want
to use inline

GREAT pal! GO ON LIKE THIS. Do not fear being
spotted as somebody the writes before turning on the brain


:)
 
J

jacob navia

Antoninus Twink a écrit :
Oh. The. Irony.

Phil "the Psycho" Carmodey lecturing other people on politeness - I've
heard it all now.

Hypocrite
–noun
1. a person who pretends to have virtues, moral or religious beliefs,
principles, etc., that he or she does not actually possess, esp. a
person whose actions belie stated beliefs.

2. a person who feigns some desirable or publicly approved attitude,
esp. one whose private life, opinions, or statements belie his or her
public statements.
 
F

Flash Gordon

jacob said:
Ian Collins a écrit :

If you use even an old CPU, it is highly unlikely that will be
less than 1GHZ.

Not if it's for an embedded application.
RAM in embedded devices is way slower than
workstations RAM too, so the ration should stay the same or even worst
than workstations.

Definitely not always. Sometimes people pick fast RAM and slow
processors (sometimes even underclocking the processor) so that RAM
access is single cycle. This can help minimise power consumption for a
given speed of processing since you are powering the devices just for
work and not for sitting in wait states.
 
B

Ben Bacarisse

jacob navia said:
Ben Bacarisse a écrit :

Inline will be a bad choice in general when then size of the inlined function's body
is bigger than the calling sequence of the normal call. I.e. when there is an
increase in the size of the code.

This is just an assertion. I was hoping you could post a C example.
maybe I have always used machines where inlining pays greater
dividends.

In one example I've just tried, the inlined version is always faster
than the non-lined but the inlined code is also smaller at higher
optimisation settings.
What is important to notice in this context is that the speed of
RAM is VERY slow compared to the speed of the in-cache RAM and
the speed of the processor. The more code you get into the code cache
the faster your program will run.

Empirical evidence of faster code when inlining is used needs to be
explained. If you gave me an example of code that was slower, I might
see what the difference is. You've formed this opinion for a reason
and like to know why you've reached a different conclusion from me.

I'll note also that the gcc implementors seem to think that inlining
helps. gcc does it even when not explicitly requested in the code
meaning I've had to work hard to prevent it in order to do some
timings.
Inlining, specially in C++ is a nightmare since it tends to be used
too much, and in situations where it shouldn't be used at all. C++
mandates inline in many contexts, and in the company where I work
this bloats the code enormously.

Obviously I can't send you the 40-50MB executable for you to
inspect it, sorry.

I don't want a non C example. Do you have one in C or does your
remark only apply to languages like C++ compilers?
lcc-win optimizes only for code size, and using only this optimization,
speed has been greatly increased.

That does not show that is could not be faster still if you included
inlining. I am not saying it would be, of course, but I'd like an
example that I can test to see where the problem really is.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,438
Messages
2,571,699
Members
48,796
Latest member
Greg L.
Top