function pointer, (automatic) inlining, performance

D

David Mathog

Assume there are N (around 10) functions like:
func1(void *, void*, int )
...
funcN(void*,void*,int)

A subset of these are to be called in the inner loop of a program,
conditionally, but always in numerical order. That is, it might be
1,3,5, or 6, or 9,10, but never 2,1 or 2,3,2. I can think of three
ways to code this (here "args" is filled in with the appropriate
values using simple code that isn't
relevant to the point under discussion here):

1),
if(test1)func1(args);
if(test2)func2(args);
...
if(testN)funcN(args);

2)
store function pointers in a list and do:
for(i=0;i<listlen;i++){
list(args); /* list holds function pointer to funcI */
}

3.
use an enum selector with entries {USEFUNC1,USEFUNC2,... USEFUNCN}
and store those selectors in a list:

for(i=0;i<listlen;i++){
switch (list){
case USEFUNC1: func1(args); break;
case USEFUNC2: func2(args); break;
...
case USEFUNCN: funcN(args); break;
}
}

The functions all perform relatively small operations and could be
inlined. Except, I don't think they can be for (2), because of the
function pointers. Since this is in the inner loop, the resulting
function call overhead is likely to be significant compared to the
inlined access. If case (3) always reduced to the equivalent of a
Fortran computed goto it would be faster than (1), since it could skip
the unnecessary if tests. However, I don't think the standard
guarantees that, and it might well reduce to the same code. In any
case, it looks like (3) is likely to be the fastest, and in any case,
should not be slower than (1). (2) would be fastest (never any extra
logic tests) if the function pointers could be inlined, but is that
possible in this case?

Are there any other options for this situation?

Thanks,

David Mathog
 
N

Nobody

Assume there are N (around 10) functions like:
func1(void *, void*, int )
...
funcN(void*,void*,int)

A subset of these are to be called in the inner loop of a program,
conditionally, but always in numerical order. That is, it might be
1,3,5, or 6, or 9,10, but never 2,1 or 2,3,2. I can think of three
ways to code this (here "args" is filled in with the appropriate
values using simple code that isn't relevant to the point under
discussion here):

In the absence of additional information, I would assume that option #1
would be the fastest, as it is the most restrictive case. The compiler
knows which functions can be called and in which order, so it enables
optimisations such as allowing later functions to use values computed in
earlier functions.

In theory, a call through a function pointer could be inlined if the
compiler can "see" all of the possible values which the pointer could
take, but I've never observed this optimisation in practice.

With regard to #3, even if the compiler could determine that there are
certain constraints upon the ordering of the list elements, it's much
harder for it to actually use this information than is the case for #1.

As for your comment about "computed goto": unless the set of cases is
huge, a tree of branch instructions is likely to be more efficient. A
modern CPU will speculatively execute a branch instruction and the
instructions following it, but if cannot calculate the destination of a
computed goto immediately, it will simply stall until the data is
available.
 
D

Dr Nick

How much of the order do you know at compile time?

And - orthogonal to this - are you better including the actual code in
the if-ladder or the switch rather than relying on functions and inlining?
 
D

David Mathog

How much of the order do you know at compile time?

Well the order is 1 before 2 before 3 ... before 10. but the sequence
can skip some positions. The situation is like dropping a ball down
stairs, it will always land on a lower stair, but can and usually will
skip some in between. The exact sequence is not knowable at compile
time. However, once specified at run time it is fixed for all further
processing. It would be nice to be able to exploit this information.
The sequence of operations is known, it really should not require any
sort of a conditional to get to it. (And it wouldn't in, say,
assembler.) Using the gcc goto extension I guess one could implement
the conditional free function along these line (pseudocode, for the
case that hits 3, 6, and 7):

/* in another function prepare a list describing the operation order
*/
list []=
({&&label3, parameter3},
{&&label6, parameter6},
{&&label7,parameter7},
(&&label11,0}};

/* run the list in this function */
goto list->label;
label1: func1(list->param); list++; goto list->label;
label2: func2(list->param); list++; goto list->label;
label3: func3(list->param); list++; goto list->label;
....
label9: func9(list->param); list++; goto list->label;
label10: func10(list->param); list++; goto list->label;
label11: return;

On second thought, that gcc goto extension may not be an option, since
the function where the list would be created is not the same as the
pseudocode function shown above, so using the && to generate the label
addresses would be a problem. And that type of goto isn't standard in
any case.
And - orthogonal to this - are you better including the actual code in
the if-ladder or the switch rather than relying on functions and inlining?

Possibly. It would definitely be a lot harder to read. Each of these
code blocks (whether as a separate function or in an if ladder) is an
operation on the same string, and the N+1th operation cannot do
anything much until the Nth has completed. So while the compiler can
optimize within each code block, there probably isn't much it could do
between them. As Nobody suggested, the CPU might be able to make some
headway with speculative execution within the if ladder.

Regards,

David Mathog
 
E

Eric Sosman

Assume there are N (around 10) functions like:
func1(void *, void*, int )
...
funcN(void*,void*,int)

A subset of these are to be called in the inner loop of a program,
conditionally, but always in numerical order. That is, it might be
1,3,5, or 6, or 9,10, but never 2,1 or 2,3,2. I can think of three
ways to code this (here "args" is filled in with the appropriate
values using simple code that isn't
relevant to the point under discussion here):

1),
if(test1)func1(args);
if(test2)func2(args);
...
if(testN)funcN(args);

2)
store function pointers in a list and do:
for(i=0;i<listlen;i++){
list(args); /* list holds function pointer to funcI */
}

3.
use an enum selector with entries {USEFUNC1,USEFUNC2,... USEFUNCN}
and store those selectors in a list:

for(i=0;i<listlen;i++){
switch (list){
case USEFUNC1: func1(args); break;
case USEFUNC2: func2(args); break;
...
case USEFUNCN: funcN(args); break;
}
}

The functions all perform relatively small operations and could be
inlined. Except, I don't think they can be for (2), because of the
function pointers. Since this is in the inner loop, the resulting
function call overhead is likely to be significant compared to the
inlined access. If case (3) always reduced to the equivalent of a
Fortran computed goto it would be faster than (1), since it could skip
the unnecessary if tests. However, I don't think the standard
guarantees that, and it might well reduce to the same code. In any
case, it looks like (3) is likely to be the fastest, and in any case,
should not be slower than (1). (2) would be fastest (never any extra
logic tests) if the function pointers could be inlined, but is that
possible in this case?

Are there any other options for this situation?


Sure. For example,

switch (bitmask_of_funcs_to_call) {
case 1:
func1(args);
break;
case 2:
func2(args);
break;
case 3:
func1(args);
func2(args);
break;
case 4:
func3(args);
break;
...

Of course, with "around 10" functions the number of cases in the
switch might stress the compiler a little ... You could hammer
it back down to semi-reasonable size with a sequence of switch'es
on subsets of the bits:

switch (bitmask_of_funcs_to_call & 07) ...
switch (bitmask_of_funcs_to_call & 070) ...
switch (bitmask_of_funcs_to_call & 0700) ...
...

The technical term for this kind of transformation is "Dumb Idea."

Stick with your original #1. If the tests are expensive and their
results don't change during the loop, you might precompute them:

int flag1 = test1();
int flag2 = test2();
...
while (looping) {
if (flag1) func1(args);
if (flag2) func2(args);
...

If the test results *do* change during the loop, then please don't
overlook the cost of maintaining list[] in proposals #2 and #3.
Also, note that #2 and #3 eliminate the specific tests you're so
worried about, but introduce new tests of their own like "Should
I iterate again?" and "Is the switch value in range of the cases?"
 
P

Phil Carmody

David Mathog said:
Assume there are N (around 10) functions like:
func1(void *, void*, int )
...
funcN(void*,void*,int)

A subset of these are to be called in the inner loop of a program,
conditionally, but always in numerical order. That is, it might be
1,3,5, or 6, or 9,10, but never 2,1 or 2,3,2. I can think of three
ways to code this ....
Are there any other options for this situation?

Yes. Profile and find what's slowing you down before you attempt
to optimise anything.

Phil
 
D

David Mathog

     Stick with your original #1.  If the tests are expensive andtheir

The tests are not expensive. In the current incarnation each is just
a bit test on the same integer.

The consensus here seems to be to stick with #1, so that's what I will
do.

Thanks all,

David Mathog
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top