Relative cost of operations

aitorf666 · Feb 2, 2008

Hi!
I have been looking for a comparison table of the relative cost of
operations in C++ because it can be very useful for all. I have found
this:

printf/scanf 1000
new/delete 800
trig. function 500
floating point 100
and so on...

But it doesn´t tell anything about some C++ features, like virtual
functions.
What is the cost of calling a virtual function comparing with a
ordinary function?
Could anyone tell me where can I find tables of operations cost for C+
+?

Thanks.
PD: Sorry for my English.

Alf P. Steinbach · Feb 2, 2008

* (e-mail address removed):

Hi!
I have been looking for a comparison table of the relative cost of
operations in C++ because it can be very useful for all. I have found
this:

printf/scanf 1000
new/delete 800
trig. function 500
floating point 100
and so on...

But it doesn´t tell anything about some C++ features, like virtual
functions.
What is the cost of calling a virtual function comparing with a
ordinary function?
Could anyone tell me where can I find tables of operations cost for C+
+?

Thanks.
PD: Sorry for my English.

Go to <url: http://www.open-std.org/jtc1/sc22/wg21/>, search for the
word "performance".

Cheers & hth.,

- Alf

Tomás Ó hÉilidhe · Feb 2, 2008

:

What is the cost of calling a virtual function comparing with a
ordinary function?

One more level of indirection.

Instead of having:

PointerToFunction();

you have:

PointerToVTable->PointerToFunction();

Rolf Magnus · Feb 2, 2008

TomÃ¡s Ã“ hÃ‰ilidhe said:
:

One more level of indirection.

Instead of having:

PointerToFunction();

Actually, a regular function is often not called through a pointer, but
directly.

you have:

PointerToVTable->PointerToFunction();

Actually, the common implementation is more like
PointerToObject->PointerToVTable->PointerToFunction(), so it's 3 additional
levels of indirection.

Tomás Ó hÉilidhe · Feb 2, 2008

Rolf Magnus:

Actually, a regular function is often not called through a pointer,
but directly.

I'm not sure what you mean by "called directly". When a function is
called in a program, the CPU's program counter is set to the address of
the function. I don't see how you could call the function without using
its address.
..

Actually, the common implementation is more like
PointerToObject->PointerToVTable->PointerToFunction(), so it's 3
additional levels of indirection.

If we have an object on the stack, then its address is simply an
offset from the stack pointer. So instead of having:

JUMP TO (address_of_function);

We have:

JUMP TO ( (stack_pointer + offset)->vtable->address_of_function );

Or more simply:

JUMP TO ( address_of_object->vtable->address_of_function );

I think this equates to two more levels of indirection in total. On
a CPU whose clock runs at 2 GHz, and where a cycle is about 4 clock
pulses, and where an indirecton takes 2 cycles, I think we're looking at
a delay of something like 8 nanoseconds.

Erik WikstrÃ¶m · Feb 3, 2008

Rolf Magnus:

I'm not sure what you mean by "called directly". When a function is
called in a program, the CPU's program counter is set to the address of
the function. I don't see how you could call the function without using
its address.
.

If we have an object on the stack, then its address is simply an
offset from the stack pointer. So instead of having:

No, if the object is on the stack the type is known at compile-time and
the correct function is called directly. VTables and such are only
needed when the exact type is not known at compile-time.

Tomás Ó hÉilidhe · Feb 3, 2008

=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=:

No, if the object is on the stack the type is known at compile-time and
the correct function is called directly. VTables and such are only
needed when the exact type is not known at compile-time.

Of course, you're right. So we're left with:

JUMP TO( address_of_object->vtable->address_of_function );

My previous guess of 8 nanoseconds was probably way off, it's probably more
like 10's of nanoseconds if you take into account adding the offsets and so
forth.

Jerry Coffin · Feb 3, 2008

Hi!
I have been looking for a comparison table of the relative cost of
operations in C++ because it can be very useful for all. I have found
this:

printf/scanf 1000
new/delete 800
trig. function 500
floating point 100
and so on...

While that may be of some use, it's only an extremely rough guideline at
very best. Just for example, the time for a print or scanf varies widely
based upon the number of conversions involved, the amount of output
produced, and the speed of the device to which the output is written --
it can easily vary by a factor of thousands.

But it doesnÂ´t tell anything about some C++ features, like virtual
functions.
What is the cost of calling a virtual function comparing with a
ordinary function?

You can't reasonably compare them directly. If you want to compare
costs, you need to compare a virtual function to something like a switch
statement that calls any one of a number of functions, or perhaps to
calling a function via a pointer.

When you make such a comparison, the time (of course) depends on the
compiler, but with the modern compilers I've tested, there's essentially
no overhead. For example, here's some output from Joe Orost's Bench++
benchmark:

Test Name: F000006 Class Name: Style
CPU Time: 1.73 nanoseconds plus or minus 0.0866
Wall/CPU: 1.00 ratio. Iteration Count:
1677721600
Test Description:
Time to test a global using a 10-way switch statement

Test Name: F000007 Class Name: Style
CPU Time: 2.73 nanoseconds plus or minus 0.136
Wall/CPU: 1.00 ratio. Iteration Count:
1677721600
Test Description:
Time to test a global using a 10-way sparse switch statement

Test Name: F000008 Class Name: Style
CPU Time: 1.94 nanoseconds plus or minus 0.0969
Wall/CPU: 1.00 ratio. Iteration Count:
1677721600
Test Description:
Time to test a global using a 10-way virtual function class

As you can see, the time for the virtual function version is quite
competitive -- with dense values, the switch statement is marginally
faster, but with sparse values, the switch statement is noticeably
slower.

Although the values vary somewhat depending on the compiler and options
I use, that trend has remained roughly constant.

Could anyone tell me where can I find tables of operations cost for C+
+?

My advice would be to download (or write) and run some code to test the
operations you care about. Writing good test code is definitely a
challenge for even the most experienced programmer.

James Kanze · Feb 3, 2008

I have been looking for a comparison table of the relative
cost of operations in C++ because it can be very useful for
all. I have found this:

printf/scanf 1000
new/delete 800
trig. function 500
floating point 100
and so on...

Which is completely irrelevant anyway---a printf( "%f" ) can
easily take more than 10 times as much time as a printf( "some
text" ). And the rest will depend on the platform---on my Sun
Sparc, at work, floating point multiply and divide are faster
than integer multiply and divide, for example.

But it doesn´t tell anything about some C++ features, like
virtual functions. What is the cost of calling a virtual
function comparing with a ordinary function?

It varies greatly from one implementation to the next, so you'd
need information relative to your platform.

James Kanze · Feb 3, 2008

[...]

I think this equates to two more levels of indirection in
total. On a CPU whose clock runs at 2 GHz, and where a cycle
is about 4 clock pulses, and where an indirecton takes 2
cycles, I think we're looking at a delay of something like 8
nanoseconds.

Do you actually know of any machines like that. The machines I
know which run at 2 GHz all are capable of executing several
instructions per clock as well. Depending on the instructions,
and provided the pipeline is full, of course.

The important point with regards to virtual functions is that
the call is indirect. On some architectures (HP/PA, for
example), this fouls up branch prediction completely, and may
end up resulting in a purge of the pipeline. Virtual functions
on those machines are fairly expensive. On other machines, this
doesn't seem to be a problem.

And of course, the question is always: expensive compared to
what? You can't compare a virtual function call to a
non-virtual function call because they don't do the same thing.
So you end up comparing it to a if/else chain, or a switch. (In
cases where the if/else chain is faster, the compiler may end up
converting the virtual function call to an if, if the code is in
a tight loop, where the profiler says it will make a
difference.)

Rolf Magnus · Feb 3, 2008

TomÃ¡s Ã“ hÃ‰ilidhe said:
Rolf Magnus:

I'm not sure what you mean by "called directly". When a function is
called in a program, the CPU's program counter is set to the address of
the function. I don't see how you could call the function without using
its address.

The address is part of the call instruction itself, called an immediate
value in Assembler. If you call the function through a pointer, first the
pointer value must be read from memory, then the function is called with
that address, using a different instruction.
Of course this may vary depending on the platform.

If we have an object on the stack, then its address is simply an
offset from the stack pointer. So instead of having:

JUMP TO (address_of_function);

We have:

JUMP TO ( (stack_pointer + offset)->vtable->address_of_function );

Or more simply:

JUMP TO ( address_of_object->vtable->address_of_function );

I think this equates to two more levels of indirection in total. On
a CPU whose clock runs at 2 GHz, and where a cycle is about 4 clock
pulses, and where an indirecton takes 2 cycles, I think we're looking at
a delay of something like 8 nanoseconds.

However, that value doesn't say much. Obviously, you won't notice a
difference for a function that is not called very often. If that function
is short, and it's called a million times per second, this might be
different. Also, not every system has a clock rate of 2 Ghz. Not everyone
is programming PCs and the like.

Tomás Ó hÉilidhe · Feb 3, 2008

James Kanze:

---on my Sun
Sparc, at work, floating point multiply and divide are faster
than integer multiply and divide, for example.

Are you doing some sort of CAD graphic design? I got introduced to it
there a few days in college; we're using it do design microchips, e.g.
where to place n-type regions and so forth. What kinds of stuff is this
machine good for?

Dominic Connor, Quant Headhunter · Feb 3, 2008

It's also worth remembering that not only do some costs change between
compiler versions, but vary during the life of the program.
new/delete and malloc/free can suffer from fragmentation, and unlike
trig functions altering their use a small amount may significantly
alter their cost.
Of course, you might not get any change at all, that's what makes this
so fun

Rolf Magnus · Feb 3, 2008

It's also worth remembering that not only do some costs change between
compiler versions, but vary during the life of the program.
new/delete and malloc/free can suffer from fragmentation, and unlike
trig functions altering their use a small amount may significantly
alter their cost.

You also might happen to use multiple threads, where new/delete need some
thread synchronization, while trig functions don't. OTOH, some memory
allocators have specific regions for objects of common size, which makes
allocating those significantly quicker. On a system without FPU, a trig
function might in turn easily take more than 10 times as much time than on
one with FPU.

Dominic Connor, Quant Headhunter · Feb 3, 2008

You also might happen to use multiple threads, where new/delete need some
thread synchronization, while trig functions don't.

Very good point, the period that a shared resource is locked is going
to affect performance in a very non-linear fashion.
In small programs, it can be trivial, but suddenly you fall off a
cliff after a trivial modification.

James Kanze · Feb 3, 2008

James Kanze:

Are you doing some sort of CAD graphic design?
No.

I got introduced to it there a few days in college; we're
using it do design microchips, e.g. where to place n-type
regions and so forth. What kinds of stuff is this machine good
for?

It's a good general purpose machine. In our case, it gets the
preference over a PC with Linux mainly because of its IO
bandwidth. In general, Solaris is also more stable and more
reliable than Linux or Windows.

Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
help with small example of file operations	9	Jun 1, 2011
Anything in the language to support better recognition of vector operations?	5	Nov 8, 2010
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
C - using malloc to dynamically assign memory for a array of strings	0	Nov 2, 2008
'Needless flexibilities' and structured records [very long]	10	Mar 15, 2013
Weird usage of printf/scanf? Why do they work?	15	Sep 7, 2009
comp.lang.c FAQ list Table of Contents	0	Jan 12, 2008

Relative cost of operations

aitorf666

Alf P. Steinbach

Tomás Ó hÉilidhe

Rolf Magnus

Tomás Ó hÉilidhe

Erik WikstrÃ¶m

Tomás Ó hÉilidhe

Jerry Coffin

James Kanze

James Kanze

Rolf Magnus

Tomás Ó hÉilidhe

Dominic Connor, Quant Headhunter

Rolf Magnus

Dominic Connor, Quant Headhunter

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads