Kaz Kylheku said:
Remember, boys and girls, this is from somenoe who thinks that the
stack-blowing idiocy known as variable length arrays is a good idea!
I don't personally implement VLAs, mostly since I don't use them, and
secondly because they would be a problem to implement with my current
compiler (if they are to be, in fact, located on the stack).
part of the reason for this is that my compiler generally keeps track of
where everything is on the stack, and so depends on having the stack layout
fixed at compile time. variable-length objects would pose a problem in that
one can no longer statically calculate all their stack offsets.
Smart use of inline speeds up programs considerably. Some small
functinos can be replaced by an instruction sequence which is as short
as the function call.
I work on GNU/Linux running on MIPS. In userland, function calls are
gross. They have to ensure that the $gp register has the correct value,
load some offsets from the global offset table and then do an indirect
branch through the $t9 register.
For instance, the puts call in this:
#include <stdio.h>
int main(void)
{
puts("hello");
return 0;
}
turns into this:
Fetch the global pointer:
lui $28,%hi(%neg(%gp_rel(main)))
addu $28,$28,$25
addiu $28,$28,%lo(%neg(%gp_rel(main)))
Now go into the global offset table to figure
out where puts is, and begin the calculation
of where the string literal "hello" is:
lw $4,%got_page(.LC0)($28)
lw $25,%call16(puts)($28)
Save our caller's return address.
sd $31,8($sp)
Finally do the call.
jal $25
But not quite; in the branch delay slot, complete calculating the address
of
the string literal:
addiu $4,$4,%got_ofst(.LC0)
Phew! It's definitely worth inlining a function that can be done in a few
instructions!
it is not so good on x86-64 either, since a function call may involve:
having to spill any values in any caller-save/scratch registers;
having to get arguments into the correct registers (this part itself a
little "painful" with SysV);
doing the call;
maybe having to reserve stack space and spill the arguments (almost
invariable with any non-leaf functions, given they are passed in scratch
registers);
....
So does a function-like macro. Only the inline function is type safe.
Can you put a number on this, like 75.3% of the cases?
What sampling method is is used, over what kind of data to arrive at the
statistic?
yeah, I think he is thinking most inline functions are large...
this is maybe about the same as me thinking that most functions are non-leaf
functions...
but, then again, I have written enough code to be almost certain that this
is the case:
a rare minority of code is leaf functions;
very little is to say that it is the leaf functions which will be eating the
running time.
What if the inline code is bloated, but it's in a tight loop that fits
nicely into the cache?
No it isn't; see MIPS code above.
yeah.
depends on the arch...
personally I suspect that function calls are cheaper on x86 than on x86-64
(SysV and Win64), since there are so few registers that there is not nearly
so much worry about the cost of spills.
Typically, shared libraries always use indirect jumps.
yeah, or at least with ELF and friends...
with PE/COFF, only non-local calls are indirect (or at least on x86 and
x86-64, I know little about MIPS...).
See use of branch delay slot in MIPS code; something can be put into the
pipeline even though a branch is happening. (Though this is now part
of the instruction set architecture and behaves the same way regardless
of whether there actually is a branch delay slot, or how large it is;
if the hardware implementation has a two cycle stall in the pipeline for a
branch, you still get just one slot to fill).
ok.