Costs of function calls

T

Thomas Kowalski

Hello everybody,
currently I am tweaking a bit on one of my projects and wondered how
high are the costs for calling a (member) function. I tried some
possibilites and come to a suprising result.
Maybe someone can explain to me what I might improve on my test or how
come this results occured?
For testing I commented different lines out. Code see below.

Regards,
Thomas Kowalski



// MS C++ 8.0
// 1 Billion additions:
// simple methode call: 2134 ms, 2123 ms,
// inline call: 2037 ms, 2053 ms, 2073 ms
// simple methode ref param: 2153 ms, 2034 ms, 2093 ms
// addition no methode 2034 ms, 2037 ms, 2032 ms
// virtual funtion call ~2100 ms
// function pointer 7060 ms, 7260 ms, 7260 ms

// Results with gcc 3.3.6 are similar but function pointer are as fast
as normal calls

#include "stdafx.h"
#include <iostream>
#include <cstdlib>
#include <vector>
#include <time.h>

using namespace std;

class Abstract {
public:
virtual size_t add(size_t& a, size_t& b) = 0;
};

class B :public Abstract {
public:
virtual size_t add(size_t& a, size_t& b);
};


class C :public Abstract {
public:
virtual size_t add(size_t& a, size_t& b);
};

class D :public Abstract {
public:
virtual size_t add(size_t& a, size_t& b);
};
/**/

class A {
public:
static size_t add(size_t& a, size_t& b);
};


//inline
inline size_t A::add(size_t& a, size_t& b) {
return a+b;
};

size_t B::add(size_t& a, size_t& b) {
return a+b;
};


size_t C::add(size_t& a, size_t& b) {
return a+b;
};

size_t D::add(size_t& a, size_t& b) {
return a+b;
};


int main(int argc, char *argv[])
{

// Abstract* a = new D();
A a;
size_t (*addition)(size_t&, size_t&);
addition = &A::add;
const size_t REPEAT = 1000000000;
size_t i,j,k;
j=5;

clock_t start = clock();
for (i=0; i<REPEAT; ++i) {
addition(i, k);
// k = a.add(i,j);
// k = i+j;
}
clock_t ende = clock();
cout << "K ist: " << k << endl;

cout << "test took " << ((ende - start)) << " ms" << endl;
cin >> i;
return 0;
}
 
V

Victor Bazarov

Thomas said:
Hello everybody,
currently I am tweaking a bit on one of my projects and wondered how
high are the costs for calling a (member) function. I tried some
possibilites and come to a suprising result.
Maybe someone can explain to me what I might improve on my test or how
come this results occured?
For testing I commented different lines out. Code see below.

Regards,
Thomas Kowalski



// MS C++ 8.0
// 1 Billion additions:
// simple methode call: 2134 ms, 2123 ms,
// inline call: 2037 ms, 2053 ms, 2073 ms
// simple methode ref param: 2153 ms, 2034 ms, 2093 ms
// addition no methode 2034 ms, 2037 ms, 2032 ms
// virtual funtion call ~2100 ms
// function pointer 7060 ms, 7260 ms, 7260 ms

// Results with gcc 3.3.6 are similar but function pointer are as fast
as normal calls

[..]

To make a real case of inline versus non-inline, you _must_ place your
non-inlined member functions in a separate translation unit. Otherwise
any decent compiler should be able to simply inline all your calls if
it can see the actual body.

I _know_ that if a function is not inlined, it's gonna costcha. Build
your project with optimization for size, use 'std::vector' and its
indexing argument, and you'll notice that if you switch to indexing
from a pointer to the first element instead, you'll get plenty of
performance improvement. Up to ten times, as a matter of fact, in
some cases.

V
 
T

Thomas Kowalski

To make a real case of inline versus non-inline, you _must_ place your
non-inlined member functions in a separate translation unit. Otherwise
any decent compiler should be able to simply inline all your calls if
it can see the actual body.

Ok, that makes sense. But how can it inline the virtual funtion call?
Abstract *a = new B();
a->add(i,j);

Can this still be inlined?
I _know_ that if a function is not inlined, it's gonna costcha. Build
your project with optimization for size, use 'std::vector' and its
indexing argument, and you'll notice that if you switch to indexing
from a pointer to the first element instead, you'll get plenty of
performance improvement. Up to ten times, as a matter of fact, in
some cases.

You mean that if you have an std::vector and use a pointer to its
values its several times slower to access a certain element then using
an index of type int or unsigned int?

Thanks a lot,
Thomas Kowalski
 
?

=?iso-8859-1?q?Erik_Wikstr=F6m?=

You mean that if you have an std::vector and use a pointer to its
values its several times slower to access a certain element then using
an index of type int or unsigned int?

No, the other way around:

std::vector<int> vec;
// Push some elements

int& a = vec[5]; // Using index

int* ptr = &(vec[0]);
int& b = *(ptr + 5) // Using pointer

The second (using pointer) will be faster than the first (using index)
provided that you don't count the cost of getting the pointer in the
first place. This can be useful when optimising loops iterating over
all the elements in a vector like:

int* ptr = &(vec[0]);
int size = vec.size();
for (int i = 0; i < size; ++i)
*(ptr + i) += 2; // Instead of vec += 2;
 
R

Rolf Magnus

Thomas said:
Ok, that makes sense. But how can it inline the virtual funtion call?
Abstract *a = new B();
a->add(i,j);

Can this still be inlined?

Basically, it can, because the compiler could know that in that particular
place the object can only be a B, but I have no idea if compilers do such
optimizations.
You mean that if you have an std::vector and use a pointer to its
values its several times slower to access a certain element then using
an index of type int or unsigned int?

I think that he means just the opposite. Last time I measured iterating a
vector, indexing was generally slower than using a pointer or an iterator
and increment it in the loop. No difference between indexing using a
pointer and using vector's operator[].
 
O

Ole Nielsby

Erik Wikström said:
int& a = vec[5]; // Using index

int* ptr = &(vec[0]);
int& b = *(ptr + 5) // Using pointer

The second (using pointer) will be faster than the first (using index)
provided that you don't count the cost of getting the pointer in the
first place. This can be useful when optimising loops iterating over
all the elements in a vector like:

int* ptr = &(vec[0]);
int size = vec.size();
for (int i = 0; i < size; ++i)
*(ptr + i) += 2; // Instead of vec += 2;


Or shorter:

for (std::vector<int>::iterator i = vec.begin(); i < vec.end(); i++)
*i += 2;

which does the same and produces similar code on modern
compilers tuned to deal with STL (tested on vc8).
 
R

Rolf Magnus

Ole said:
int* ptr = &(vec[0]);
int size = vec.size();
for (int i = 0; i < size; ++i)
*(ptr + i) += 2; // Instead of vec += 2;


Or shorter:

for (std::vector<int>::iterator i = vec.begin(); i < vec.end(); i++)
*i += 2;


And here you can see what it looks like if you go one step further in using
algorithms:

std::transform(vec.begin(), vec.end(), vec.begin(),
std::bind2nd(std::plus<int>(), 2));

This is harder to write, harder to read and not shorter than the for loop.
 
V

Victor Bazarov

Ole said:
Erik Wikström said:
int& a = vec[5]; // Using index

int* ptr = &(vec[0]);
int& b = *(ptr + 5) // Using pointer

The second (using pointer) will be faster than the first (using
index) provided that you don't count the cost of getting the pointer
in the first place. This can be useful when optimising loops
iterating over all the elements in a vector like:

int* ptr = &(vec[0]);
int size = vec.size();
for (int i = 0; i < size; ++i)
*(ptr + i) += 2; // Instead of vec += 2;


Or shorter:

for (std::vector<int>::iterator i = vec.begin(); i < vec.end(); i++)
*i += 2;

which does the same and produces similar code on modern
compilers tuned to deal with STL (tested on vc8).


Tested with what optimization settings? This code has to instantiate
the vector<int>::iterator vec.size() times (since you used i++ and not
++i). Besides, it has to compare the iterators, it has to dereference
the 'i'. Don't just jump to conclusions. Didn't I say I profiled
that sort of code enough to claim pointers are the fastest?

Bottomline: avoid function calls like the plague!

V
 
V

Victor Bazarov

Rolf said:
[..] Last time I measured
iterating a vector, indexing was generally slower than using a
pointer or an iterator and increment it in the loop. No difference
between indexing using a pointer and using vector's operator[].

Not true. Indexing using a pointer is still faster because it does
not involve a function call. That's the whole point!

V
 
L

Lionel B

Ole said:
Erik Wikström said:
int& a = vec[5]; // Using index

int* ptr = &(vec[0]);
int& b = *(ptr + 5) // Using pointer

The second (using pointer) will be faster than the first (using
index) provided that you don't count the cost of getting the pointer
in the first place. This can be useful when optimising loops
iterating over all the elements in a vector like:

int* ptr = &(vec[0]);
int size = vec.size();
for (int i = 0; i < size; ++i)
*(ptr + i) += 2; // Instead of vec += 2;


Or shorter:

for (std::vector<int>::iterator i = vec.begin(); i < vec.end(); i++)
*i += 2;

which does the same and produces similar code on modern
compilers tuned to deal with STL (tested on vc8).


Tested with what optimization settings? This code has to instantiate
the vector<int>::iterator vec.size() times (since you used i++ and not
++i). Besides, it has to compare the iterators, it has to dereference
the 'i'.


Surely you have to compare pointers (or ints, or whatever) to detect the
loop termination condition, and you have to dereference a pointer too...?
Ok, the iterator version may involve function calls but these are quite
likely trivial enough to be optimised away by any compiler worth its salt.
Don't just jump to conclusions. Didn't I say I profiled
that sort of code enough to claim pointers are the fastest?

Bottomline: avoid function calls like the plague!

....except if you are certain they will be magicked away by your
compiler.
 
V

Victor Bazarov

Lionel said:
Ole said:
int& a = vec[5]; // Using index

int* ptr = &(vec[0]);
int& b = *(ptr + 5) // Using pointer

The second (using pointer) will be faster than the first (using
index) provided that you don't count the cost of getting the
pointer in the first place. This can be useful when optimising
loops iterating over all the elements in a vector like:

int* ptr = &(vec[0]);
int size = vec.size();
for (int i = 0; i < size; ++i)
*(ptr + i) += 2; // Instead of vec += 2;

Or shorter:

for (std::vector<int>::iterator i = vec.begin(); i < vec.end(); i++)
*i += 2;

which does the same and produces similar code on modern
compilers tuned to deal with STL (tested on vc8).


Tested with what optimization settings? This code has to instantiate
the vector<int>::iterator vec.size() times (since you used i++ and
not ++i). Besides, it has to compare the iterators, it has to
dereference the 'i'.


Surely you have to compare pointers (or ints, or whatever) to detect
the loop termination condition, and you have to dereference a pointer
too...? Ok, the iterator version may involve function calls but these
are quite likely trivial enough to be optimised away by any compiler
worth its salt.


Generally speaking, no. Never assume. Especially about your compiler
(which you may be required to use, instead of picking one "worth its
salt").
...except if you are certain they will be magicked away by your
compiler.

You can never be certain unless you actually see the assembly code.
You can never have enough time to look through all assembly code if
your project is beyond the simplest few KLOC. Conclusion: never
trust your compiler to do "the right thing" when it comes to your
program's performance. Measure and make appropriate changes.

V
 
L

Lionel B

On Wed, 20 Dec 2006 08:48:27 -0500, Victor Bazarov wrote:

/.../


...except if you are certain they will be magicked away by your
compiler.

Ok, here's a small test program:

--- begin code ---

#include <iostream>
#include <vector>
#include <time.h>

int main()
{
const int s = 100000000;
std::vector<int> vec(s,1);

clock_t start,endt;

start = clock();
for (int i=0; i<s; ++i) vec += 2;
endt = clock();
std::cout << "index : time = " << endt-start << '\n';

start = clock();
for (std::vector<int>::iterator i=vec.begin(); i!=vec.end(); ++i) *i += 2;
endt = clock();
std::cout << "iterator : time = " << endt-start << '\n';

int* p = &(vec[0]);
start = clock();
for (int i=0; i<s; ++i) *(p+i) += 2;
endt = clock();
std::cout << "pointer : time = " << endt-start << '\n';

return 0;
}

--- end code ---

Compiled with default optimisation (gcc 4.1.1 on linux x86-64):

index : time = 3240000
iterator : time = 3140000
pointer : time = 640000

Compiled with -O1:

index : time = 270000
iterator : time = 230000
pointer : time = 230000

Compiled with -O3:

index : time = 230000
iterator : time = 220000
pointer : time = 230000
 
V

Victor Bazarov

Lionel said:
On Wed, 20 Dec 2006 08:48:27 -0500, Victor Bazarov wrote:

/.../


...except if you are certain they will be magicked away by your
compiler.

Ok, here's a small test program:

--- begin code ---
[..]
--- end code ---

Compiled with default optimisation (gcc 4.1.1 on linux x86-64):
[..]

I think you're missing the point. If you want to discuss the
capabilities of any particular compiler to perform some function
call optimisations, the newsgroup dedicated to that compiler is
probably a better place. Here we talk C++ *in general*, and *in
general* nothing is "magicked away". That's all.

V
 
L

Lionel B

Lionel said:
On Wed, 20 Dec 2006 08:48:27 -0500, Victor Bazarov wrote:

/.../

Bottomline: avoid function calls like the plague!

...except if you are certain they will be magicked away by your
compiler.

Ok, here's a small test program:

--- begin code ---
[..]
--- end code ---

Compiled with default optimisation (gcc 4.1.1 on linux x86-64):
[..]

I think you're missing the point. If you want to discuss the
capabilities of any particular compiler to perform some function
call optimisations,

I don't. I only included that for information. The point I would like
to make is that I would not wilfully obfuscate my code with pointers where
indexing or iterators are more appropriate, for the sole purpose of
second-guessing compiler optimisation capabilities - and this seems to be
exactly what you are suggesting when you say "avoid function calls like the
plague". To me that is premature optimisation pure and simple.
 
M

Mathias Gaunard

Victor said:
Tested with what optimization settings? This code has to instantiate
the vector<int>::iterator vec.size() times (since you used i++ and not
++i). Besides, it has to compare the iterators, it has to dereference
the 'i'. Don't just jump to conclusions. Didn't I say I profiled
that sort of code enough to claim pointers are the fastest?

Modern compilers generate exactly the same assembler when using vector
iterators or pointers.
 
R

Rolf Magnus

Victor said:
Rolf said:
[..] Last time I measured
iterating a vector, indexing was generally slower than using a
pointer or an iterator and increment it in the loop. No difference
between indexing using a pointer and using vector's operator[].

Not true.

Well, it's truely what I measured.
Indexing using a pointer is still faster because it does
not involve a function call. That's the whole point!

This happens only if I switch optimizations off completely. Size
optimitation - as you suggested - isn't sufficient. I wouldn't expect
anything different, since the operator usually just contains nothing more
than the pointer indexing, and the code for that is smaller than a function
call, so I would expect a size optimizing compiler to inline the operator,
producing the exact same code as the direct pointer indexing.
 
P

peter koch

Victor Bazarov skrev:
[snip]
To make a real case of inline versus non-inline, you _must_ place your
non-inlined member functions in a separate translation unit. Otherwise
any decent compiler should be able to simply inline all your calls if
it can see the actual body.

I _know_ that if a function is not inlined, it's gonna costcha. Build
your project with optimization for size, use 'std::vector' and its
indexing argument, and you'll notice that if you switch to indexing
from a pointer to the first element instead, you'll get plenty of
performance improvement. Up to ten times, as a matter of fact, in
some cases.
Your argument is valid for a specific compiler and a specific
optimisation setting and not very generic. On most compilers and with
most optimisation-settings, there should be no difference at all.
Actually, some advocate using indexing rather than pointer-iterations
for raw vectors, claiming better optimisation capabilities due to fewer
aliasing issues for the compiler.

/Peter
 
R

Rolf Magnus

Victor said:
Compiled with default optimisation (gcc 4.1.1 on linux x86-64):
[..]

I think you're missing the point. If you want to discuss the
capabilities of any particular compiler to perform some function
call optimisations, the newsgroup dedicated to that compiler is
probably a better place. Here we talk C++ *in general*, and *in
general* nothing is "magicked away". That's all.

Well, *in general*, you can't do any optimizations, because C++ doesn't
specify how much time the generated code must take for a specific piece of
source code. So if you suggest any optimization techniques, you have to
look at the real world, not just at the standard.
 
B

Bo Persson

Victor said:
Thomas said:
Hello everybody,
currently I am tweaking a bit on one of my projects and wondered
how high are the costs for calling a (member) function. I tried
some possibilites and come to a suprising result.
Maybe someone can explain to me what I might improve on my test or
how come this results occured?
For testing I commented different lines out. Code see below.

Regards,
Thomas Kowalski



// MS C++ 8.0
// 1 Billion additions:
// simple methode call: 2134 ms, 2123 ms,
// inline call: 2037 ms, 2053 ms, 2073 ms
// simple methode ref param: 2153 ms, 2034 ms, 2093 ms
// addition no methode 2034 ms, 2037 ms, 2032 ms
// virtual funtion call ~2100 ms
// function pointer 7060 ms, 7260 ms, 7260 ms

// Results with gcc 3.3.6 are similar but function pointer are as
fast as normal calls

[..]

To make a real case of inline versus non-inline, you _must_ place
your non-inlined member functions in a separate translation unit.
Otherwise any decent compiler should be able to simply inline all
your calls if it can see the actual body.

With the compiler used here that won't help, as it is perfectly capable of
inlining over module boundaries.

Writing a synthetic benchmark for a global optimizer is extremely difficult.
:)


Bo Persson
 
V

Victor Bazarov

Rolf said:
Victor said:
Compiled with default optimisation (gcc 4.1.1 on linux x86-64):
[..]

I think you're missing the point. If you want to discuss the
capabilities of any particular compiler to perform some function
call optimisations, the newsgroup dedicated to that compiler is
probably a better place. Here we talk C++ *in general*, and *in
general* nothing is "magicked away". That's all.

Well, *in general*, you can't do any optimizations, because C++
doesn't specify how much time the generated code must take for a
specific piece of source code. So if you suggest any optimization
techniques, you have to look at the real world, not just at the
standard.

*In general* if one's concerned with overhead of calling a function,
one should inline the function body. Whether your compiler is up
to snuff for that or you do it manually does not matter. In general,
however, you can't rely on your compiler, so doing it manually is
your only *generally* good approach. Are you suggesting that, given

int foo(int a, int b)
{
return a+b;
}
int bar(int a, int b)
{
return foo(a+b);
}

, calling 'bar' *can* be faster than calling 'foo'?

V
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top