ragi said:
As a disadvantage, using a fn ptr requires one more memory access.
Not necessarily!
supposing i call f at location 0x1000
normal function call would translate to
call 0x1000 (no extra memory access req.)
Not necessarily! There are a lot of platforms on which it would
translate to something more like
lea A0, 0x1000 ("load effective address into address register 0")
bsr A0 ("branch to subroutine whose address is in A0")
if f were a function ptr.
lea A0, 0x1000 (load the given constant into an address register)
f();
the call (only the f()

would translate to:
bsr A0
In other words, exactly the same sequence.
Some of the processors do have the call (or bsr, or bal) instructions
that are able to work on relative addresses, and the optimizer might
be able to take advantage of that for "nearby" subroutines in the
same translation unit. However, it isn't uncommon for the relative
offset to be fairly limited (e.g., no more than 32 Kb away), so for
further subroutines, or for subroutines in other translation units,
and -especially- for library routines, it is common for a full
"load address" instructions to be used, so that the linker can more
easily relocate the code.
Note too that the address constant you have used for the example
is only 16 bits and so gives the impression of fitting in to a single
call. Tf, though, the address does not happen to be in the first 64 Kb
of address space (not being there is increasingly common these days),
it would come out more like call 0x000010000 with the call instruction
itself possibly needing 2 bytes and the address 4, for a total of 6
bytes. Now suppose you have a tight loop, such as
while (is_digit(*p++));
then according to the information you gave, this would always take
less instructions with the address hard-coded. But observe that
lea A0, 0x000010000
... various init loop stuff ...
HERE: bsr A0
... loop test stuff setting the condition code ...
... increment p
brnz HERE ... branch back of the test result was non-zero
here, the bsr A0 might only take 2 bytes per cycle. Therefore,
even for processors that have direct subroutine calls by
absolute address, they will not necessarily use that all the
time, as it is possible in a loop to amortize the cost of loading
the address over all of the loop iterations. Furthermore, when
the code is handled that way, the processor is more likely to be
able to cache the instructions that form the loop in a very fast
cache, possibly executing the entire loop at the microcode level,
since it would not have to assume that it was fetching new
instructions from main memory.