Function pointers: performance penalty?

R

Rui Maciel

Is there a performance penalty associated with the use of function pointers? If so, how bad is it?


Thanks in advance,
Rui Maciel
 
N

Nick Keighley

Is there a performance penalty associated with the use of
function pointers? If so, how bad is it?

Thanks in advance,
Rui Maciel

yes. Small. Say one extra instruction and extra memory access.

If you don't use a funtion pointer what will you do instead
and is *that* cheaper?
 
J

jacob navia

Rui Maciel a écrit :
Is there a performance penalty associated with the use of function pointers? If so, how bad is it?


Thanks in advance,
Rui Maciel

This depends on the CPU, the deep of the pipeline (if any) and many other things.
If there is a pipeline, coupled with speculative execution, an indirect jump
will provoke a flush of the pipeline, since all instructions after the indirect jump
can't be known, unless the value of the jump is loaded very early in the register,
a,d the speculative execution machine can determine it wont be affected by subsequent
operations...

A call instruction has its destination embedded in the code stream. The CPU will know it
in advance and will know that it will not change. Using that knowledge, it can start
executing the call instruction in advance.

Within the context of lcc-win, i was forced to use ALWAYS for all function calls
ONLY indirect calls. This slowed down the code considerably, more than 15-20%.

If you are using the indirect jump within a tight loop, it is better if you
could avoid it, but if you are using an indirect call anyway, it is because of
some reason, and that is not going to change and it is not going to be replaced
by something else.
 
S

Seebs

If the code is portable, or even designed to run on different versions of
the same operating system, or on different but binary-compatible hardware,
then that's not a very viable strategy. It will tell you something, but the
figures could change when the Pentium is replaced by the Hexium,.

And for much the same reason, asking people what the performance penalty
is won't do you much better.

-s
 
N

Nick Keighley

You could suggest that it is, in fact, faster for an oft called
function. No 32/64 bit address read required each and every call.

what? The function address will be sitting in a register?
I'd not thought of that.
 
P

Phil Carmody

Rui Maciel said:
Is there a performance penalty associated with the use of function pointers? If so, how bad is it?

Nope, function pointers are faster.

Phil
 
J

jacob navia

Phil Carmody a écrit :
Nope, function pointers are faster.

Phil

What? AN indirect call using a function pointer is faster than a direct
call?

You are just talking nonsense (as always).
 
A

Antoninus Twink

Phil Carmody a écrit :

What? AN indirect call using a function pointer is faster than a direct
call?

You are just talking nonsense (as always).

Yes.

"Phil" is obviously clueless - a couple days ago he showed us all that
he doesn't know what a heap is, now we learn that he's completely
ignorant about function pointers too.

All Carmodey brings to this group is a sycophantic blind support for
Heathfield, and a lot of bile directed against anyone and everyone who
dares to point out how inadequate his purported knowledge of C is.
 
S

Seebs

What? AN indirect call using a function pointer is faster than a direct
call?
You are just talking nonsense (as always).

It's actually not totally impossible in at least one fairly common case.

Imagine that you are in the fairly normal case on a modern system where
many functions you call are reached through dynamically-linked libraries.
When you call such a function, you actually call through a little bit of
dummy code which makes sure the right library is loaded and then rewrites
itself to be a jump to the correct function. However, if you have obtained
a pointer to the function, you may well have a pointer to the "real" function
-- the actual address at which something was loaded. In which case the call
through the function pointer may be one step faster.

I don't think it makes sense to try to come up with a general answer. The
indirect call and direct call are not intrinsically different; ultimately,
a "direct" call is still a call through a pointer anyway.

-s
 
S

Seebs

All Carmodey brings to this group is a sycophantic blind support for
Heathfield, and a lot of bile directed against anyone and everyone who
dares to point out how inadequate his purported knowledge of C is.

Wow, he sure pissed in your anonymous cheerios!

-s
 
S

Seebs

A human has got some insight into how a processor works.He can predict the
future to some extent, if an expert. (I'm not, I didn't realise the cache
effects could be so severe as Jacob describes).

On OOO processors, I'm told, it's quite common for it to be impossible to
know until runtime which of two things will be faster in a given case...

-s
 
S

Stephen Sprunk

Nick said:
what? The function address will be sitting in a register?
I'd not thought of that.

On every system I'm familiar with, yes. For instance, on x86, this is a
direct call:

CALL 0xDEADBEEF

and this is an indirect call:

CALL [EAX]

One obviously has to load the function address into the register before
that instruction, of course, but that can often be hoisted quite a bit,
making the memory/cache access latency disappear.

S
 
J

jacob navia

Gordon Burditt a écrit :
Why? What would force you to do that?

The 64 bit AMD-Intel processor is not a fully 64 bit processor.
The CALL instruction accepts a signed 2GB offset. If you want the
code that you generate to be able to be relocated in memory
ANYWHERE within the 64 bit address space you can't use plain
call instructions and you have to go through an indirect
call.
 
S

Stephen Sprunk

Rui said:
Is there a performance penalty associated with the use of function
pointers? If so, how bad is it?

It completely depends on the system you're using; on some it may be
significant, while on others it may be negligible. Measure for yourself
and find out.

<OT>
This comes down to the cost of a direct call vs. an indirect call. An
indirect call costs you an extra register, though that probably won't
hurt. More importantly, an indirect call requires that the CPU
correctly predict the _target_ of the call. Modern high-end CPUs have a
branch target predictor that is responsible for that, similar to the
branch predictor that is responsible for guessing whether a conditional
branch is taken or not. If either of those predictions is wrong, the
pipeline will stall, which could burn dozens or even hundreds of cycles.

In the context of function pointers, if you use the same one over and
over, the cost is likely to be negligible--at least after the first
time. If you keep changing the pointer, though, you're going to have to
pay that cost _every time you change it_.

Embedded CPUs may not have a branch target predictor at all because it
adds die size, heat, power consumption, and cost. In that case, an
indirect branch is going to hurt every time, regardless of whether the
pointer is changed, because the CPU isn't even _attempting_ to predict
what the target is and has to stall the pipeline to find out.
</OT>

S
 
J

jacob navia

Richard a écrit :
Its no more indirect than anything else if the compiler does its job in
the case of an oft called address stored in a register.

You mean

fnptr = sqrt
fnptr(2.3);


Yes, in this case it *could* be optimized, and maybe some
compiler does optimize this... But this usage of function pointers
is highly unusual... We are speaking about the general case of course, when
it is NOT known which address is going to be called.
 
B

Ben Bacarisse

Malcolm McLean said:
With an aggressively optimising compiler there shouldn't be much of a
penalty, in typical use.
This is because you use a function pointer to make a function more general.
The classic example form the library is qsort(). Typically this will be used
as

int compfunc(const void *e1, const void *e2)
{
const char **str1 = e1;
const char **str2 = e2;

return strcmp(*str1, *str2);
}

...
qsort(array, N, sizeof(char *), compfunc);
...

So if there is significant penalty in calling a function indirectly, in
machine code, a compiler can replace the indirect jump with a hard-linked
one, or even inline the function totally.

This is theoretically possible, but does it ever happen in a C
compiler? In many implementations, qsort is just a function compiled
long ago that simply has code in it to do an indirect call. Have you
come across a system that does inline or re-write this call?
 
F

Flash Gordon

Stephen said:
It completely depends on the system you're using; on some it may be
significant, while on others it may be negligible. Measure for yourself
and find out.

and repeat on all implementations of interest, since each one might be
different. That includes each version of the each processor even if you
are using the same binary in case the instruction speeds vary (which
they sometimes do).

Embedded CPUs may not have a branch target predictor at all because it
adds die size, heat, power consumption, and cost. In that case, an
indirect branch is going to hurt every time, regardless of whether the
pointer is changed, because the CPU isn't even _attempting_ to predict
what the target is and has to stall the pipeline to find out.
</OT>

<OT>

It may still not hurt... some processors have instructions such as,
"branch in two instructions time", "call in two instructions time" and
"return in two instructions time" so that the instruction pipeline is
not broken.
 
B

BGB / cr88192

jacob navia said:
Gordon Burditt a écrit :

The 64 bit AMD-Intel processor is not a fully 64 bit processor.
The CALL instruction accepts a signed 2GB offset. If you want the
code that you generate to be able to be relocated in memory
ANYWHERE within the 64 bit address space you can't use plain
call instructions and you have to go through an indirect
call.


this still need not force all calls to be full 64-bits though...

if the code produced is PIC code, then one can simply use relative jumps and
calls, and make the simplifying assumption that all calls are within the
+-2GB window.

at link time, if this assumption fails for some reason, it can be thunked.
so, with a little creative handling, nearly all such long-distance calls can
be eliminated...


the problem then is Linux x86-64, which has a bad habit of wanting people to
use the GOT for everything (however, with a tiny amount of creative code and
some link-time trickery, likely this could be worked around as well...).

so, really, I am not sure what is the reason for your issues...


for variables, it is slightly harder.

however, Windows itself imposes some simplifying assumptions, so this can be
largely ignored as well.

in Linux-land though, one might be forced (absent linker trickery) to resort
to using the GOT...


or such...
 
S

Stephen Sprunk

jacob said:
Gordon Burditt a écrit :

The 64 bit AMD-Intel processor is not a fully 64 bit processor.
The CALL instruction accepts a signed 2GB offset. If you want the
code that you generate to be able to be relocated in memory
ANYWHERE within the 64 bit address space you can't use plain
call instructions and you have to go through an indirect
call.

Last I heard, nobody has yet implemented the x86-64 ABI's "large" code
model. Difficulty with the CALL instruction is probably one major
reason for that; AMD didn't leave any particularly good way to implement
it, though if demand ever arises, I bet we'll see a new CALL opcode
appear that will accept a 64-bit offset.

The other code models require all code to be in the first 4GB of RAM
(positive 2GB for user code, negative 2GB for kernel code), just like
x86, and the rest of memory is for data only.

S
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top