Why pass doubles as const references?

army1987 · Feb 10, 2013

Is there any good reason to declare a function parameter as `const double
&foo` rather than just `double foo`? I can see the point of that when
passing a very large object, but with a double I'd expect any improvement
in performance to be negligible. I've seen code using the former, but I
guess that's because it was translated from Fortran, where all function
arguments are passed by reference -- or am I missing something?

Ian Collins · Feb 10, 2013

army1987 said:
Is there any good reason to declare a function parameter as `const double
&foo` rather than just `double foo`? I can see the point of that when
passing a very large object, but with a double I'd expect any improvement
in performance to be negligible. I've seen code using the former, but I
guess that's because it was translated from Fortran, where all function
arguments are passed by reference -- or am I missing something?

On most current systems, I would expect the performance to decrease
(building the reference) rather than increase passing a double by const
reference.

Rui Maciel · Feb 10, 2013

army1987 said:
Is there any good reason to declare a function parameter as `const double
&foo` rather than just `double foo`? I can see the point of that when
passing a very large object, but with a double I'd expect any improvement
in performance to be negligible. I've seen code using the former, but I
guess that's because it was translated from Fortran, where all function
arguments are passed by reference -- or am I missing something?

In architectures where pointers are 64-bit wide there is no point in passing
primitives as const references. Doing so even introduces a performance
penalty, because it is just as expensive to pass a double as is a pointer,
and a reference implies a redirection.

If I'm not mistaken, this point is covered in one of Scott Meyer's effective
C++ books.

Rui Maciel

Victor Bazarov · Feb 11, 2013

Is there any good reason to declare a function parameter as `const double
&foo` rather than just `double foo`? I can see the point of that when
passing a very large object, but with a double I'd expect any improvement
in performance to be negligible. I've seen code using the former, but I
guess that's because it was translated from Fortran, where all function
arguments are passed by reference -- or am I missing something?

Perhaps you're missing the age of the code in question. Fifteen years
ago passing a double by a const reference would have a noticeable
difference to passing by value. Not anymore, most likely.

V

Öö Tiib · Feb 11, 2013

Is there any good reason to declare a function parameter as `const double
&foo` rather than just `double foo`?

There can be good reasons. For example if it is one of overloads and
overloads accept const& for several class types plus double. Making it
different from other overloads may (or may not) cause subtle difficulties
of usage (say picking pointer to that overload) from template.

I can see the point of that when passing a very large object, but with a
double I'd expect any improvement in performance to be negligible.

Most likely it does not affect performance at all either way. Both
ways you can pass billions of parameters per second. If it is complex
algorithm then performance of parameter passing does not affect overall
performance by any percentage. If it is trivial algorithm then it is
often inlined and so parameter's won't be passed.

I've seen code using the former, but I guess that's because it was
translated from Fortran, where all function arguments are passed by
reference -- or am I missing something?

That can be other good reason. Most code generators/translators
produce such code (in circumstances) that contains some overhead.
For example I have seen a switch with default only in generated code.
It looks nonsensical and feels waste, but in practice a compiler
later optimizes it out and so the perceptional "inefficiency" does
not manifest itself.

Rui Maciel · Feb 11, 2013

Ã–Ã¶ Tiib said:
Most likely it does not affect performance at all either way. Both
ways you can pass billions of parameters per second. If it is complex
algorithm then performance of parameter passing does not affect overall
performance by any percentage. If it is trivial algorithm then it is
often inlined and so parameter's won't be passed.

<example>
rui@kubuntu:tmp$ cat main.c++
#include <ctime>
#include <iostream>

double count = 0;

void value(double foo)
{
count += foo;
}

void reference(double const &foo)
{
count += foo;
}

int main(void)
{
const int max = 100000000;
clock_t t = clock();

count = 0;
for(int i = 0; i < max; ++i)
{
value(1.0f);
}

std::cout << "time pass by value: " << clock() - t << std::endl;

t = clock();
count = 0;
for(int i = 0; i < max; ++i)
{
reference(1.0f);
}

std::cout << "time pass by reference: " << clock() - t << std::endl;

return 0;
}

rui@kubuntu:tmp$ g++ main.c++ && ./a.out
time pass by value: 640000
time pass by reference: 1670000
</example>

Rui Maciel

Ian Collins · Feb 11, 2013

Rui said:
<example>
rui@kubuntu:tmp$ cat main.c++
#include <ctime>
#include <iostream>

double count = 0;

void value(double foo)
{
count += foo;
}

void reference(double const &foo)
{
count += foo;
}

int main(void)
{
const int max = 100000000;
clock_t t = clock();

count = 0;
for(int i = 0; i < max; ++i)
{
value(1.0f);
}

std::cout << "time pass by value: " << clock() - t << std::endl;

t = clock();
count = 0;
for(int i = 0; i < max; ++i)
{
reference(1.0f);
}

std::cout << "time pass by reference: " << clock() - t << std::endl;

return 0;
}

rui@kubuntu:tmp$ g++ main.c++ && ./a.out
time pass by value: 640000
time pass by reference: 1670000

That's what I would have expected, however on a reasonable quick i7
(with an extra 0 in max):

32 bit:

g++ x.cc && ./a.out
time pass by value: 7510000
time pass by reference: 2700000

64 bit:

g++ x.cc -m64 && ./a.out
time pass by value: 2440000
time pass by reference: 2760000

With a little optimisation:

g++ x.cc -m64 -O1 && ./a.out
time pass by value: 2410000
time pass by reference: 2410000

Öö Tiib · Feb 11, 2013

Which completely optimizes out (eliminates) both function calls
(reference and value).

Nope, it inlines those. It can not optimize out summing into global with
external linkage so easily. What you think where those 2.4
seconds went? Inlining was what I predicted. Billion cycles took less
than 3 seconds unoptimized as well. That on only one core from quad of
i7. It is unlikely that any of it matters for performance of practical
application. Just acquiring meaningful billion doubles from any media
(including RAM) is far more expensive.

Ian Collins · Feb 11, 2013

Scott said:
Which completely optimizes out (eliminates) both function calls (reference and value).

So nothing takes 4.8 seconds to execute? The calls are still made, the
function bodies are optimised.

This is what happens when the function calls are optimised away:

CC x.cc -fast -m64 && ./a.out
time pass by value: 0
time pass by reference: 0

Ian Collins · Feb 11, 2013

Scott said:
It optimizes them out. There is no 'CALL' instruction.

It does that by inlining the functions, so there is no function call.

I'm not so daft as to post something without checking first. The first
loop is:

call clock
movq $0, count(%rip)
movl $1000000000, %ebx
..L7:
movsd .LC1(%rip), %xmm0
call _Z5valued
subl $1, %ebx
jne .L7

The optimised value function is:

..globl _Z5valued
.type _Z5valued, @function
_Z5valued:
..LFB961:
addsd count(%rip), %xmm0
movsd %xmm0, count(%rip)
ret

Unoptimised:

..globl _Z5valued
.type _Z5valued, @function
_Z5valued:
..LFB961:
pushq %rbp
..LCFI3:
movq %rsp, %rbp
..LCFI4:
movsd %xmm0, -8(%rbp)
movsd count(%rip), %xmm0
addsd -8(%rbp), %xmm0
movsd %xmm0, count(%rip)
leave
..LCFI5:
ret

Which looks like a typical x64 stack frame optimisation.

Jorgen Grahn · Feb 11, 2013

On most current systems, I would expect the performance to decrease
(building the reference) rather than increase passing a double by const
reference.

Wouldn't the expensive part be dealing with aliasing? E.g.

void foo(const double& bar) {
double baz = bar;
fred();
baz += bar;
...
}

can't just assume fred() doesn't modify bar, in the general case.

/Jorgen

Rui Maciel · Feb 11, 2013

Ã–Ã¶ Tiib said:
Nope, it inlines those. It can not optimize out summing into global with
external linkage so easily. What you think where those 2.4
seconds went? Inlining was what I predicted. Billion cycles took less
than 3 seconds unoptimized as well. That on only one core from quad of
i7. It is unlikely that any of it matters for performance of practical
application. Just acquiring meaningful billion doubles from any media
(including RAM) is far more expensive.

You are assuming that a very specific corner case is somehow the rule, which
is a bad assumption to make. Just because a compiler can, as a corner case,
optimize away pure functions, it doesn't mean that all possible and
conceivable function calls will be optimized away. For instance, the corner
case you are counting on simply doesn't happen if the functions are a part
of a library.

<code>
rui@kubuntu:tmp$ cat main.c++

double count = 0;

void value(double foo)
{
count += foo;
}

void reference(double const &foo)
{
count += foo;
}
</code>

The following instructions are obtained with -O1, -O2, and -O3:

<snip>
Z5valued:
..LFB1006:
.cfi_startproc
addsd count(%rip), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc

// snip
_Z9referenceRKd:
..LFB1007:
.cfi_startproc
movsd count(%rip), %xmm0
addsd (%rdi), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc
</snip>

Rui Maciel

Rui Maciel · Feb 11, 2013

Ian said:
That's what I would have expected, however on a reasonable quick i7
(with an extra 0 in max):

32 bit:

g++ x.cc && ./a.out
time pass by value: 7510000
time pass by reference: 2700000

64 bit:

g++ x.cc -m64 && ./a.out
time pass by value: 2440000
time pass by reference: 2760000

With a little optimisation:

g++ x.cc -m64 -O1 && ./a.out
time pass by value: 2410000
time pass by reference: 2410000

<example>
rui@kubuntu:tmp$ g++ -m64 -O1 main.c++ && ./a.out
time: 520000
time: 590000
</example>

Here's a dump of the relevant assembly bits:

<example>
rui@kubuntu:tmp$ g++ -m64 -O1 main.c++ -S
rui@kubuntu:tmp$ cat main.s

// snip

_Z5valued:
..LFB1006:
.cfi_startproc
addsd count(%rip), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc

// snip

_Z9referenceRKd:
..LFB1007:
.cfi_startproc
movsd count(%rip), %xmm0
addsd (%rdi), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc

// snip
</example>

The extra instruction included in reference() represents the pointer
dereferencing which is expected from passing a parameter by reference.

Rui Maciel

Ian Collins · Feb 11, 2013

Rui said:
You are assuming that a very specific corner case is somehow the rule, which
is a bad assumption to make. Just because a compiler can, as a corner case,
optimize away pure functions, it doesn't mean that all possible and
conceivable function calls will be optimized away. For instance, the corner
case you are counting on simply doesn't happen if the functions are a part
of a library.

It certainly isn't a corner case. The compiler is free to inline any
functions at can see.

The following instructions are obtained with -O1, -O2, and -O3:

<snip>
Z5valued:
..LFB1006:
.cfi_startproc
addsd count(%rip), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc

// snip
_Z9referenceRKd:
..LFB1007:
.cfi_startproc
movsd count(%rip), %xmm0
addsd (%rdi), %xmm0
movsd %xmm0, count(%rip)
ret
.cfi_endproc
</snip>

The functions will be generated, but they are not necessarily called.
Check the code for main with -O3.

Rui Maciel · Feb 11, 2013

Ian said:
It certainly isn't a corner case. The compiler is free to inline any
functions at can see.

Yeah, it's a corner case. You simply can't assume that every function is a
pure function that will always be inlined under every conceivable scenario.
After all, where does the C++ standard mandate that?

You can only count on it if you invest your time making sure that a specific
compiler will be able to compile a specific function within your project to
match your specific requirements, but this is way past C++'s territory and
firmly within platform and implementation-specifics.

Rui Maciel

Ian Collins · Feb 11, 2013

Rui said:
Yeah, it's a corner case. You simply can't assume that every function is a
pure function that will always be inlined under every conceivable scenario.
After all, where does the C++ standard mandate that?

If it's a corner case, most code lives in a dodecahedron!

Who said a function is always inlined?

You can only count on it if you invest your time making sure that a specific
compiler will be able to compile a specific function within your project to
match your specific requirements, but this is way past C++'s territory and
firmly within platform and implementation-specifics.

Most C++ relies in the inlining of trivial functions, it's at the heart
of the language. Would you expect every call to std::vector's
operator[] to involve an actual call?

Rui Maciel · Feb 11, 2013

Ian said:
If it's a corner case, most code lives in a dodecahedron!

"Most" is a bit of a weasel word. Nevertheless, even if you actually
believe that all object code consists of a long winded opcode dump that is
free from any function call, it is necessary to at least acknowledge the
existence of shared libraries. It's a bit hard to optimize away code which
is linked only dynamically.

But this is way beyond the realm of C++.

Who said a function is always inlined?

I certainly didn't said that.

You can only count on it if you invest your time making sure that a
specific compiler will be able to compile a specific function within your
project to match your specific requirements, but this is way past C++'s
territory and firmly within platform and implementation-specifics.

Click to expand...

Most C++ relies in the inlining of trivial functions, it's at the heart
of the language. Would you expect every call to std::vector's
operator[] to involve an actual call?

Trivial functions are a small subset of the whole domain of functions. A
corner case, if you will. No one can assume that all functions are trivial
functions, and subsequently that all possible optimization tricks can be
applied to all conceivable functions.

Rui Maciel

Rui Maciel · Feb 11, 2013

Paavo said:
Why -O1 and not -O2?

Because that's what Ian Collins used.

You are free to run the same test with O2 or O3, if you feel like it. No
one is trying to hide anythinig from anyone. Science, and all that.

Rui Maciel

Ian Collins · Feb 11, 2013

Rui said:
"Most" is a bit of a weasel word. Nevertheless, even if you actually
believe that all object code consists of a long winded opcode dump that is
free from any function call,

Where did I say I did?

You can only count on it if you invest your time making sure that a
specific compiler will be able to compile a specific function within your
project to match your specific requirements, but this is way past C++'s
territory and firmly within platform and implementation-specifics.

Click to expand...

Most C++ relies in the inlining of trivial functions, it's at the heart
of the language. Would you expect every call to std::vector's
operator[] to involve an actual call?

Click to expand...

Trivial functions are a small subset of the whole domain of functions. A
corner case, if you will.

I don't think you can apply the term "corner case" to a large part of
the standard library!

No one can assume that all functions are trivial
functions, and subsequently that all possible optimization tricks can be
applied to all conceivable functions.

I'm sure no one does.

Öö Tiib · Feb 11, 2013

You are assuming that a very specific corner case is somehow the rule, which
is a bad assumption to make.

I assume nothing. Test code demonstrating that oh so very special cornered
case was posted by you.

Just because a compiler can, as a corner case,
optimize away pure functions, it doesn't mean that all possible and
conceivable function calls will be optimized away. For instance, the corner
case you are counting on simply doesn't happen if the functions are a part
of a library.

I claimed that it does not likely matter. What test demonstrates that it does?
Stack operations (passing parameters) and indirection to value in cache areso fast that those did not matter much even with older hardware and compilers. Modern stuff does them in parallel (with a likely floating point operation) pipeline and overhead is zero and difference is maybe 5% bigger powerconsumption on case of so tight cycle of calling so trivial function that screams for inlining anyway.

Const reference passing in constructor	2	Jun 21, 2011
Using const references as local smart pointer?	0	May 13, 2007
constant references	2	Mar 13, 2008
pass arguments between Sun FORTRAN and C programs	6	Oct 14, 2009
"explicit pass-by-reference" idiom	4	Apr 23, 2010
pass by reference vs pass by pointer	6	Sep 22, 2005
Acceptable to "const" a parameter in the definition but not in thedeclaration?	13	Jun 8, 2010
Pointers vs References: A Question on Style	26	Jul 25, 2004

Why pass doubles as const references?

army1987

Ian Collins

Rui Maciel

Victor Bazarov

Öö Tiib

Rui Maciel

Ian Collins

Öö Tiib

Ian Collins

Ian Collins

Jorgen Grahn

Rui Maciel

Rui Maciel

Ian Collins

Rui Maciel

Ian Collins

Rui Maciel

Rui Maciel

Ian Collins

Öö Tiib

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads