Why pass doubles as const references?

Discussion in 'C++' started by army1987, Feb 10, 2013.

  1. army1987

    army1987 Guest

    Is there any good reason to declare a function parameter as `const double
    &foo` rather than just `double foo`? I can see the point of that when
    passing a very large object, but with a double I'd expect any improvement
    in performance to be negligible. I've seen code using the former, but I
    guess that's because it was translated from Fortran, where all function
    arguments are passed by reference -- or am I missing something?

    --
    [ T H I S S P A C E I S F O R R E N T ]
    Troppo poca cultura ci rende ignoranti, troppa ci rende folli.
    -- fathermckenzie di it.cultura.linguistica.italiano
    <http://xkcd.com/397/>
     
    army1987, Feb 10, 2013
    #1
    1. Advertising

  2. army1987

    Ian Collins Guest

    army1987 wrote:
    > Is there any good reason to declare a function parameter as `const double
    > &foo` rather than just `double foo`? I can see the point of that when
    > passing a very large object, but with a double I'd expect any improvement
    > in performance to be negligible. I've seen code using the former, but I
    > guess that's because it was translated from Fortran, where all function
    > arguments are passed by reference -- or am I missing something?


    On most current systems, I would expect the performance to decrease
    (building the reference) rather than increase passing a double by const
    reference.

    --
    Ian Collins
     
    Ian Collins, Feb 10, 2013
    #2
    1. Advertising

  3. army1987

    Rui Maciel Guest

    army1987 wrote:

    > Is there any good reason to declare a function parameter as `const double
    > &foo` rather than just `double foo`? I can see the point of that when
    > passing a very large object, but with a double I'd expect any improvement
    > in performance to be negligible. I've seen code using the former, but I
    > guess that's because it was translated from Fortran, where all function
    > arguments are passed by reference -- or am I missing something?


    In architectures where pointers are 64-bit wide there is no point in passing
    primitives as const references. Doing so even introduces a performance
    penalty, because it is just as expensive to pass a double as is a pointer,
    and a reference implies a redirection.

    If I'm not mistaken, this point is covered in one of Scott Meyer's effective
    C++ books.


    Rui Maciel
     
    Rui Maciel, Feb 10, 2013
    #3
  4. On 2/10/2013 3:52 PM, army1987 wrote:
    > Is there any good reason to declare a function parameter as `const double
    > &foo` rather than just `double foo`? I can see the point of that when
    > passing a very large object, but with a double I'd expect any improvement
    > in performance to be negligible. I've seen code using the former, but I
    > guess that's because it was translated from Fortran, where all function
    > arguments are passed by reference -- or am I missing something?


    Perhaps you're missing the age of the code in question. Fifteen years
    ago passing a double by a const reference would have a noticeable
    difference to passing by value. Not anymore, most likely.

    V
    --
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, Feb 11, 2013
    #4
  5. army1987

    Öö Tiib Guest

    On Sunday, 10 February 2013 22:52:26 UTC+2, army1987 wrote:
    > Is there any good reason to declare a function parameter as `const double
    > &foo` rather than just `double foo`?


    There can be good reasons. For example if it is one of overloads and
    overloads accept const& for several class types plus double. Making it
    different from other overloads may (or may not) cause subtle difficulties
    of usage (say picking pointer to that overload) from template.

    > I can see the point of that when passing a very large object, but with a
    > double I'd expect any improvement in performance to be negligible.


    Most likely it does not affect performance at all either way. Both
    ways you can pass billions of parameters per second. If it is complex
    algorithm then performance of parameter passing does not affect overall
    performance by any percentage. If it is trivial algorithm then it is
    often inlined and so parameter's won't be passed.

    > I've seen code using the former, but I guess that's because it was
    > translated from Fortran, where all function arguments are passed by
    > reference -- or am I missing something?


    That can be other good reason. Most code generators/translators
    produce such code (in circumstances) that contains some overhead.
    For example I have seen a switch with default only in generated code.
    It looks nonsensical and feels waste, but in practice a compiler
    later optimizes it out and so the perceptional "inefficiency" does
    not manifest itself.
     
    Öö Tiib, Feb 11, 2013
    #5
  6. army1987

    Rui Maciel Guest

    Öö Tiib wrote:

    > Most likely it does not affect performance at all either way. Both
    > ways you can pass billions of parameters per second. If it is complex
    > algorithm then performance of parameter passing does not affect overall
    > performance by any percentage. If it is trivial algorithm then it is
    > often inlined and so parameter's won't be passed.


    <example>
    rui@kubuntu:tmp$ cat main.c++
    #include <ctime>
    #include <iostream>

    double count = 0;

    void value(double foo)
    {
    count += foo;
    }


    void reference(double const &foo)
    {
    count += foo;
    }


    int main(void)
    {
    const int max = 100000000;
    clock_t t = clock();

    count = 0;
    for(int i = 0; i < max; ++i)
    {
    value(1.0f);
    }

    std::cout << "time pass by value: " << clock() - t << std::endl;

    t = clock();
    count = 0;
    for(int i = 0; i < max; ++i)
    {
    reference(1.0f);
    }

    std::cout << "time pass by reference: " << clock() - t << std::endl;

    return 0;
    }

    rui@kubuntu:tmp$ g++ main.c++ && ./a.out
    time pass by value: 640000
    time pass by reference: 1670000
    </example>


    Rui Maciel
     
    Rui Maciel, Feb 11, 2013
    #6
  7. army1987

    Ian Collins Guest

    Rui Maciel wrote:
    > Öö Tiib wrote:
    >
    >> Most likely it does not affect performance at all either way. Both
    >> ways you can pass billions of parameters per second. If it is complex
    >> algorithm then performance of parameter passing does not affect overall
    >> performance by any percentage. If it is trivial algorithm then it is
    >> often inlined and so parameter's won't be passed.

    >
    > <example>
    > rui@kubuntu:tmp$ cat main.c++
    > #include <ctime>
    > #include <iostream>
    >
    > double count = 0;
    >
    > void value(double foo)
    > {
    > count += foo;
    > }
    >
    >
    > void reference(double const &foo)
    > {
    > count += foo;
    > }
    >
    >
    > int main(void)
    > {
    > const int max = 100000000;
    > clock_t t = clock();
    >
    > count = 0;
    > for(int i = 0; i < max; ++i)
    > {
    > value(1.0f);
    > }
    >
    > std::cout << "time pass by value: " << clock() - t << std::endl;
    >
    > t = clock();
    > count = 0;
    > for(int i = 0; i < max; ++i)
    > {
    > reference(1.0f);
    > }
    >
    > std::cout << "time pass by reference: " << clock() - t << std::endl;
    >
    > return 0;
    > }
    >
    > rui@kubuntu:tmp$ g++ main.c++ && ./a.out
    > time pass by value: 640000
    > time pass by reference: 1670000


    That's what I would have expected, however on a reasonable quick i7
    (with an extra 0 in max):

    32 bit:

    g++ x.cc && ./a.out
    time pass by value: 7510000
    time pass by reference: 2700000

    64 bit:

    g++ x.cc -m64 && ./a.out
    time pass by value: 2440000
    time pass by reference: 2760000

    With a little optimisation:

    g++ x.cc -m64 -O1 && ./a.out
    time pass by value: 2410000
    time pass by reference: 2410000

    --
    Ian Collins
     
    Ian Collins, Feb 11, 2013
    #7
  8. army1987

    Öö Tiib Guest

    On Monday, 11 February 2013 21:27:49 UTC+2, Scott Lurndal wrote:
    > Ian Collins <> writes:
    > >With a little optimisation:
    > >
    > >g++ x.cc -m64 -O1 && ./a.out
    > >time pass by value: 2410000
    > >time pass by reference: 2410000

    >
    > Which completely optimizes out (eliminates) both function calls
    > (reference and value).


    Nope, it inlines those. It can not optimize out summing into global with
    external linkage so easily. What you think where those 2.4
    seconds went? Inlining was what I predicted. Billion cycles took less
    than 3 seconds unoptimized as well. That on only one core from quad of
    i7. It is unlikely that any of it matters for performance of practical
    application. Just acquiring meaningful billion doubles from any media
    (including RAM) is far more expensive.
     
    Öö Tiib, Feb 11, 2013
    #8
  9. army1987

    Ian Collins Guest

    Scott Lurndal wrote:
    > Ian Collins <> writes:
    >
    >> With a little optimisation:
    >>
    >> g++ x.cc -m64 -O1 && ./a.out
    >> time pass by value: 2410000
    >> time pass by reference: 2410000
    >>

    >
    > Which completely optimizes out (eliminates) both function calls (reference and value).


    So nothing takes 4.8 seconds to execute? The calls are still made, the
    function bodies are optimised.

    This is what happens when the function calls are optimised away:

    CC x.cc -fast -m64 && ./a.out
    time pass by value: 0
    time pass by reference: 0

    :)

    --
    Ian Collins
     
    Ian Collins, Feb 11, 2013
    #9
  10. army1987

    Ian Collins Guest

    Scott Lurndal wrote:
    > =?ISO-8859-1?Q?=D6=F6_Tiib?= <> writes:
    >> On Monday, 11 February 2013 21:27:49 UTC+2, Scott Lurndal wrote:
    >>> Ian Collins <> writes:
    >>>> With a little optimisation:
    >>>>
    >>>> g++ x.cc -m64 -O1 && ./a.out
    >>>> time pass by value: 2410000
    >>>> time pass by reference: 2410000
    >>>
    >>> Which completely optimizes out (eliminates) both function calls
    >>> (reference and value).

    >>
    >> Nope, it inlines those.

    >
    > It optimizes them out. There is no 'CALL' instruction.
    >
    > It does that by inlining the functions, so there is no function call.


    I'm not so daft as to post something without checking first. The first
    loop is:

    call clock
    movq $0, count(%rip)
    movl $1000000000, %ebx
    ..L7:
    movsd .LC1(%rip), %xmm0
    call _Z5valued
    subl $1, %ebx
    jne .L7

    The optimised value function is:

    ..globl _Z5valued
    .type _Z5valued, @function
    _Z5valued:
    ..LFB961:
    addsd count(%rip), %xmm0
    movsd %xmm0, count(%rip)
    ret

    Unoptimised:

    ..globl _Z5valued
    .type _Z5valued, @function
    _Z5valued:
    ..LFB961:
    pushq %rbp
    ..LCFI3:
    movq %rsp, %rbp
    ..LCFI4:
    movsd %xmm0, -8(%rbp)
    movsd count(%rip), %xmm0
    addsd -8(%rbp), %xmm0
    movsd %xmm0, count(%rip)
    leave
    ..LCFI5:
    ret

    Which looks like a typical x64 stack frame optimisation.

    --
    Ian Collins
     
    Ian Collins, Feb 11, 2013
    #10
  11. army1987

    Jorgen Grahn Guest

    On Sun, 2013-02-10, Ian Collins wrote:
    > army1987 wrote:
    >> Is there any good reason to declare a function parameter as `const double
    >> &foo` rather than just `double foo`? I can see the point of that when
    >> passing a very large object, but with a double I'd expect any improvement
    >> in performance to be negligible. I've seen code using the former, but I
    >> guess that's because it was translated from Fortran, where all function
    >> arguments are passed by reference -- or am I missing something?

    >
    > On most current systems, I would expect the performance to decrease
    > (building the reference) rather than increase passing a double by const
    > reference.


    Wouldn't the expensive part be dealing with aliasing? E.g.

    void foo(const double& bar) {
    double baz = bar;
    fred();
    baz += bar;
    ...
    }

    can't just assume fred() doesn't modify bar, in the general case.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
     
    Jorgen Grahn, Feb 11, 2013
    #11
  12. army1987

    Rui Maciel Guest

    Öö Tiib wrote:

    > Nope, it inlines those. It can not optimize out summing into global with
    > external linkage so easily. What you think where those 2.4
    > seconds went? Inlining was what I predicted. Billion cycles took less
    > than 3 seconds unoptimized as well. That on only one core from quad of
    > i7. It is unlikely that any of it matters for performance of practical
    > application. Just acquiring meaningful billion doubles from any media
    > (including RAM) is far more expensive.


    You are assuming that a very specific corner case is somehow the rule, which
    is a bad assumption to make. Just because a compiler can, as a corner case,
    optimize away pure functions, it doesn't mean that all possible and
    conceivable function calls will be optimized away. For instance, the corner
    case you are counting on simply doesn't happen if the functions are a part
    of a library.

    <code>
    rui@kubuntu:tmp$ cat main.c++

    double count = 0;

    void value(double foo)
    {
    count += foo;
    }


    void reference(double const &foo)
    {
    count += foo;
    }
    </code>

    The following instructions are obtained with -O1, -O2, and -O3:

    <snip>
    Z5valued:
    ..LFB1006:
    .cfi_startproc
    addsd count(%rip), %xmm0
    movsd %xmm0, count(%rip)
    ret
    .cfi_endproc

    // snip
    _Z9referenceRKd:
    ..LFB1007:
    .cfi_startproc
    movsd count(%rip), %xmm0
    addsd (%rdi), %xmm0
    movsd %xmm0, count(%rip)
    ret
    .cfi_endproc
    </snip>



    Rui Maciel
     
    Rui Maciel, Feb 11, 2013
    #12
  13. army1987

    Rui Maciel Guest

    Ian Collins wrote:

    > That's what I would have expected, however on a reasonable quick i7
    > (with an extra 0 in max):
    >
    > 32 bit:
    >
    > g++ x.cc && ./a.out
    > time pass by value: 7510000
    > time pass by reference: 2700000
    >
    > 64 bit:
    >
    > g++ x.cc -m64 && ./a.out
    > time pass by value: 2440000
    > time pass by reference: 2760000
    >
    > With a little optimisation:
    >
    > g++ x.cc -m64 -O1 && ./a.out
    > time pass by value: 2410000
    > time pass by reference: 2410000


    <example>
    rui@kubuntu:tmp$ g++ -m64 -O1 main.c++ && ./a.out
    time: 520000
    time: 590000
    </example>


    Here's a dump of the relevant assembly bits:

    <example>
    rui@kubuntu:tmp$ g++ -m64 -O1 main.c++ -S
    rui@kubuntu:tmp$ cat main.s

    // snip

    _Z5valued:
    ..LFB1006:
    .cfi_startproc
    addsd count(%rip), %xmm0
    movsd %xmm0, count(%rip)
    ret
    .cfi_endproc

    // snip

    _Z9referenceRKd:
    ..LFB1007:
    .cfi_startproc
    movsd count(%rip), %xmm0
    addsd (%rdi), %xmm0
    movsd %xmm0, count(%rip)
    ret
    .cfi_endproc

    // snip
    </example>

    The extra instruction included in reference() represents the pointer
    dereferencing which is expected from passing a parameter by reference.


    Rui Maciel
     
    Rui Maciel, Feb 11, 2013
    #13
  14. army1987

    Ian Collins Guest

    Rui Maciel wrote:
    > Öö Tiib wrote:
    >
    >> Nope, it inlines those. It can not optimize out summing into global with
    >> external linkage so easily. What you think where those 2.4
    >> seconds went? Inlining was what I predicted. Billion cycles took less
    >> than 3 seconds unoptimized as well. That on only one core from quad of
    >> i7. It is unlikely that any of it matters for performance of practical
    >> application. Just acquiring meaningful billion doubles from any media
    >> (including RAM) is far more expensive.

    >
    > You are assuming that a very specific corner case is somehow the rule, which
    > is a bad assumption to make. Just because a compiler can, as a corner case,
    > optimize away pure functions, it doesn't mean that all possible and
    > conceivable function calls will be optimized away. For instance, the corner
    > case you are counting on simply doesn't happen if the functions are a part
    > of a library.


    It certainly isn't a corner case. The compiler is free to inline any
    functions at can see.

    > The following instructions are obtained with -O1, -O2, and -O3:
    >
    > <snip>
    > Z5valued:
    > ..LFB1006:
    > .cfi_startproc
    > addsd count(%rip), %xmm0
    > movsd %xmm0, count(%rip)
    > ret
    > .cfi_endproc
    >
    > // snip
    > _Z9referenceRKd:
    > ..LFB1007:
    > .cfi_startproc
    > movsd count(%rip), %xmm0
    > addsd (%rdi), %xmm0
    > movsd %xmm0, count(%rip)
    > ret
    > .cfi_endproc
    > </snip>



    The functions will be generated, but they are not necessarily called.
    Check the code for main with -O3.

    --
    Ian Collins
     
    Ian Collins, Feb 11, 2013
    #14
  15. army1987

    Rui Maciel Guest

    Ian Collins wrote:

    > It certainly isn't a corner case. The compiler is free to inline any
    > functions at can see.


    Yeah, it's a corner case. You simply can't assume that every function is a
    pure function that will always be inlined under every conceivable scenario.
    After all, where does the C++ standard mandate that?

    You can only count on it if you invest your time making sure that a specific
    compiler will be able to compile a specific function within your project to
    match your specific requirements, but this is way past C++'s territory and
    firmly within platform and implementation-specifics.


    Rui Maciel
     
    Rui Maciel, Feb 11, 2013
    #15
  16. army1987

    Ian Collins Guest

    Rui Maciel wrote:
    > Ian Collins wrote:
    >
    >> It certainly isn't a corner case. The compiler is free to inline any
    >> functions at can see.

    >
    > Yeah, it's a corner case. You simply can't assume that every function is a
    > pure function that will always be inlined under every conceivable scenario.
    > After all, where does the C++ standard mandate that?


    If it's a corner case, most code lives in a dodecahedron!

    Who said a function is always inlined?

    > You can only count on it if you invest your time making sure that a specific
    > compiler will be able to compile a specific function within your project to
    > match your specific requirements, but this is way past C++'s territory and
    > firmly within platform and implementation-specifics.


    Most C++ relies in the inlining of trivial functions, it's at the heart
    of the language. Would you expect every call to std::vector's
    operator[] to involve an actual call?

    --
    Ian Collins
     
    Ian Collins, Feb 11, 2013
    #16
  17. army1987

    Rui Maciel Guest

    Ian Collins wrote:

    >>> It certainly isn't a corner case. The compiler is free to inline any
    >>> functions at can see.

    >>
    >> Yeah, it's a corner case. You simply can't assume that every function is
    >> a pure function that will always be inlined under every conceivable
    >> scenario. After all, where does the C++ standard mandate that?

    >
    > If it's a corner case, most code lives in a dodecahedron!


    "Most" is a bit of a weasel word. Nevertheless, even if you actually
    believe that all object code consists of a long winded opcode dump that is
    free from any function call, it is necessary to at least acknowledge the
    existence of shared libraries. It's a bit hard to optimize away code which
    is linked only dynamically.

    But this is way beyond the realm of C++.


    > Who said a function is always inlined?


    I certainly didn't said that.


    >> You can only count on it if you invest your time making sure that a
    >> specific compiler will be able to compile a specific function within your
    >> project to match your specific requirements, but this is way past C++'s
    >> territory and firmly within platform and implementation-specifics.

    >
    > Most C++ relies in the inlining of trivial functions, it's at the heart
    > of the language. Would you expect every call to std::vector's
    > operator[] to involve an actual call?


    Trivial functions are a small subset of the whole domain of functions. A
    corner case, if you will. No one can assume that all functions are trivial
    functions, and subsequently that all possible optimization tricks can be
    applied to all conceivable functions.


    Rui Maciel
     
    Rui Maciel, Feb 11, 2013
    #17
  18. army1987

    Rui Maciel Guest

    Paavo Helde wrote:

    >> <example>
    >> rui@kubuntu:tmp$ g++ -m64 -O1 main.c++ && ./a.out
    >> time: 520000
    >> time: 590000
    >>
    >> The extra instruction included in reference() represents the pointer
    >> dereferencing which is expected from passing a parameter by reference.

    >
    > Why -O1 and not -O2?


    Because that's what Ian Collins used.

    You are free to run the same test with O2 or O3, if you feel like it. No
    one is trying to hide anythinig from anyone. Science, and all that.


    Rui Maciel
     
    Rui Maciel, Feb 11, 2013
    #18
  19. army1987

    Ian Collins Guest

    Rui Maciel wrote:
    > Ian Collins wrote:
    >
    >>>> It certainly isn't a corner case. The compiler is free to inline any
    >>>> functions at can see.
    >>>
    >>> Yeah, it's a corner case. You simply can't assume that every function is
    >>> a pure function that will always be inlined under every conceivable
    >>> scenario. After all, where does the C++ standard mandate that?

    >>
    >> If it's a corner case, most code lives in a dodecahedron!

    >
    > "Most" is a bit of a weasel word. Nevertheless, even if you actually
    > believe that all object code consists of a long winded opcode dump that is
    > free from any function call,


    Where did I say I did?

    >>> You can only count on it if you invest your time making sure that a
    >>> specific compiler will be able to compile a specific function within your
    >>> project to match your specific requirements, but this is way past C++'s
    >>> territory and firmly within platform and implementation-specifics.

    >>
    >> Most C++ relies in the inlining of trivial functions, it's at the heart
    >> of the language. Would you expect every call to std::vector's
    >> operator[] to involve an actual call?

    >
    > Trivial functions are a small subset of the whole domain of functions. A
    > corner case, if you will.


    I don't think you can apply the term "corner case" to a large part of
    the standard library!

    > No one can assume that all functions are trivial
    > functions, and subsequently that all possible optimization tricks can be
    > applied to all conceivable functions.


    I'm sure no one does.

    --
    Ian Collins
     
    Ian Collins, Feb 11, 2013
    #19
  20. army1987

    Öö Tiib Guest

    On Monday, 11 February 2013 22:52:30 UTC+2, Rui Maciel wrote:
    > Öö Tiib wrote:
    > > Nope, it inlines those. It can not optimize out summing into global with
    > > external linkage so easily. What you think where those 2.4
    > > seconds went? Inlining was what I predicted. Billion cycles took less
    > > than 3 seconds unoptimized as well. That on only one core from quad of
    > > i7. It is unlikely that any of it matters for performance of practical
    > > application. Just acquiring meaningful billion doubles from any media
    > > (including RAM) is far more expensive.

    >
    > You are assuming that a very specific corner case is somehow the rule, which
    > is a bad assumption to make.


    I assume nothing. Test code demonstrating that oh so very special cornered
    case was posted by you. ;)

    > Just because a compiler can, as a corner case,
    > optimize away pure functions, it doesn't mean that all possible and
    > conceivable function calls will be optimized away. For instance, the corner
    > case you are counting on simply doesn't happen if the functions are a part
    > of a library.


    I claimed that it does not likely matter. What test demonstrates that it does?
    Stack operations (passing parameters) and indirection to value in cache areso fast that those did not matter much even with older hardware and compilers. Modern stuff does them in parallel (with a likely floating point operation) pipeline and overhead is zero and difference is maybe 5% bigger powerconsumption on case of so tight cycle of calling so trivial function that screams for inlining anyway.
     
    Öö Tiib, Feb 11, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dan
    Replies:
    1
    Views:
    2,326
    Jack Klein
    Nov 26, 2003
  2. Mr. SweatyFinger
    Replies:
    2
    Views:
    2,002
    Smokey Grindel
    Dec 2, 2006
  3. Replies:
    11
    Views:
    1,109
  4. Javier
    Replies:
    2
    Views:
    568
    James Kanze
    Sep 4, 2007
  5. K. Frank
    Replies:
    4
    Views:
    181
Loading...

Share This Page