Bill said:
OK, I'm officially confused...I specifically said "two seconds or 20
seconds" as being "ten-fold" (as in, a thousand is a hundred "ten-fold"),
then you re-define "fold" as "ten times slower".
Sorry, I was confusing "fold" with "orders of magnitude". Fold, proper,
commonly means multiplication by that factor which makes what you said
correct and what I said wrong.
"6" was the magic number I was told (or actually read) to work with...
This may be true, and I don't doubt that it depends on the specific
function call and optimization "tricks" of the compiler in any event...
On a lot of modern architectures (this excludes x86-32 although it
includes x86-64) this is quite trivial (see below).
I will admit that the ever-increasing power and speed of computer
hardware does make a lot of these considerations practically moot...but
The point is not the speed of the hardware. No matter how fast the
hardware becomes a ten fold slowdown is STILL a ten fold slowdown. The
point is that x86-64 is not register starved allowing the compiler to
avoid using stacks to pass parameters and especially for the kind of
trivial functions we were discussing can also allow the compiler to do
*nothing* to pass parameters via overlays.
Hence for the kinds of functions we're discussing (small utility
functions to reduce indentation, remember that we're not discussing
function calling *in general*) passing parameters can be optimised to
generate zero instructions and calling the function itself generates
one instruction. On most modern machines that instruction takes the
same amount of time to execute as a conditional branch.
that still doesn't excuse how I saw people coding back when it made
a TREMENDOUS practical difference, and these WERE what had
to be considered the elite systems software engineers in the entire
world...
And now we have the conclusion. Your trick programming and inlining
WERE useful in the age of dino-mainframes where workstation CPUs were
nothing more than glorified microcontrollers. Today even lowly 50 cent
microcontrollers don't break a sweat executing deeply nested function
calls (unless you're doing something silly like directly generating
video signals on output pins without the help of graphics hardware).