Writing efficient normal distribution function (for learning purposes).

Discussion in 'C++' started by Luca Cerone, Feb 17, 2012.

  1. Luca Cerone

    Luca Cerone Guest

    Dear all, I'm a beginner in C++ so I'm sorry if my question might seem silly.
    I'm also new to the group so I just say hello to all the people in it.

    I've written a function that computes the Normal Distribution, but when comparing the time to create 1000 vectors
    each with 1000 points I've the same performance than using the analogous Matlab built-in function.
    I'd like to improve my code for speed.
    I'm doing this mainly to learn how to write efficient C++ code, so please just don't tell me
    that library X and Y do this faster than my code, but help me understanding why and how to improve
    my code.

    I need both a single value and a vector version of this function,
    so this is how I have defined them.

    // this is the single point version of the function:
    double normdist(const double& px,const double& mu,const double& csigma){
    // evaluating the formula taken from http://en.wikipedia.org/wiki/Normal_distribution
    double val = sqrt(2*M_PI)*csigma;
    val = 1/val;
    val = val*exp(-pow(px-mu,2)/(2*pow(csigma,2)));
    return val ;
    }

    //this is the vector version. It takes as input a vector of double and gives the vector of corresponding values as output:

    vector<double> normdist(const vector<double>& vx,const double& mu,const double& csigma){
    vector<double> gv(vx.size()) ; //allocates the space for the vector gv.
    for (int i=0 ; i<vx.size() ; i++){
    gv=normdist(vx,mu,csigma);
    }
    return gv ;
    }

    As I told I'm a beginner so I guess there is plenty of room for improving the speed and the quality of the code
    (just writing to you I have realized that the initialization of gv might not be optimal, since I create values I don't need).

    Thank you very much to all of you!
    Luca
    Luca Cerone, Feb 17, 2012
    #1
    1. Advertising

  2. Hi Luca

    If You can live with fixed precision You code will run much faster if You use
    int's or __int64's and then multiply /devide as much as You need to get Your the
    precision You need.

    Best regards
    Asger-P
    Asger Joergensen, Feb 17, 2012
    #2
    1. Advertising

  3. Luca Cerone

    lucacerone Guest

    Hi Asger-P, thanks for your reply.
    Let me understand if I've understood.. by fixed precision you mean having
    number with a fixed number of digits? If so, yes I can live with it
    as far as I have 8 exact decimal points.

    > If You can live with fixed precision You code will run much faster if You use
    > int's or __int64's and then multiply /devide as much as You need to get Your the
    > precision You need.


    I'm sorry but I don't understand you. How would you suggest to use ints?
    What type is it __int64?

    Thanks a lot for you help in any case,
    Luca
    lucacerone, Feb 17, 2012
    #3
  4. Hi lucacerone

    lucacerone wrote:

    > Hi Asger-P, thanks for your reply.
    > Let me understand if I've understood.. by fixed precision you mean having
    > number with a fixed number of digits? If so, yes I can live with it
    > as far as I have 8 exact decimal points.


    int64_t val = DoubleVal * 100000000;

    do the rest of the work with the int and then You divide by the same number
    when You need the result.

    > > If You can live with fixed precision You code will run much faster if You use
    > > int's or __int64's and then multiply /devide as much as You need to get Your the
    > > precision You need.

    >
    > I'm sorry but I don't understand you. How would you suggest to use ints?
    > What type is it __int64?


    64 bit integer my borland compiler like __int64, but I think int64_t is the
    cross compiler way to write it.

    I'm not into math so I don't know if You can use this method for Your project.

    I do a lot of gradient graphics, so when I calculate color steps i use this
    method and it is a lot faster then using floating point values like float and
    double.

    Best regards
    Asger-P
    Asger Joergensen, Feb 17, 2012
    #4
  5. Asger Joergensen <> wrote:
    > If You can live with fixed precision You code will run much faster if You use
    > int's or __int64's and then multiply /devide as much as You need to get Your the
    > precision You need.


    This isn't the early 90's. Processors have got a bit faster at
    calculating with floating point values in the meantime.
    Juha Nieminen, Feb 17, 2012
    #5
  6. Luca Cerone <> wrote:
    > I'd like to improve my code for speed.


    First of all, are you sure you are compiling with optimizations / in
    release mode? A surprisingly common mistake to not to do that.

    > double val = sqrt(2*M_PI)*csigma;


    A compiler might or might not be able to calculate that "sqrt(2*M_PI)"
    at compile time (because compilers oftentimes do not perform floating point
    calculations at compile time due to a myriad of reasons; of course it
    depends on the compiler and its version). If you want a few clock cycles
    off, might be a safer bet to write the value directly.

    > val = val*exp(-pow(px-mu,2)/(2*pow(csigma,2)));


    Likewise a compiler might or might not be able to optimize a
    "pow(x,2)" into "x*x". It may be a safer bet to write it explicitly.
    (std::pow is significantly slower than multiplication.)

    > vector<double> normdist(const vector<double>& vx,const double& mu,const double& csigma){
    > vector<double> gv(vx.size()) ; //allocates the space for the vector gv.


    How many times is this function called in your program? Are you
    measuring its speed by calling this function thousands/millions of
    times and seeing how long it takes overall?

    If yes, then there's your major problem. Memory allocation is
    (unfortunately) a slow operation in C++ (not really C++'s fault,
    but that's another story). You might want to pass the return value
    as a reference parameter to the function and have it write there
    instead of allocating a new vector each time. Then you can reuse the
    same vector object for each call, avoiding the memory allocation
    (except for the very first time it's called, of course).
    Juha Nieminen, Feb 17, 2012
    #6
  7. Re: Writing efficient normal distribution function (for learningpurposes).

    On 2/17/2012 10:41 AM, Luca Cerone wrote:
    > Dear all, I'm a beginner in C++ so I'm sorry if my question might seem silly.
    > I'm also new to the group so I just say hello to all the people in it.
    >
    > I've written a function that computes the Normal Distribution, but when comparing the time to create 1000 vectors
    > each with 1000 points I've the same performance than using the analogous Matlab built-in function.
    > I'd like to improve my code for speed.
    > I'm doing this mainly to learn how to write efficient C++ code, so please just don't tell me
    > that library X and Y do this faster than my code, but help me understanding why and how to improve
    > my code.
    >
    > I need both a single value and a vector version of this function,
    > so this is how I have defined them.
    >
    > // this is the single point version of the function:
    > double normdist(const double& px,const double& mu,const double& csigma){
    > // evaluating the formula taken from http://en.wikipedia.org/wiki/Normal_distribution
    > double val = sqrt(2*M_PI)*csigma;
    > val = 1/val;
    > val = val*exp(-pow(px-mu,2)/(2*pow(csigma,2)));
    > return val ;


    Some compilers are actually good at parsing expressions and converting
    them into efficient machine code. You could try simply writing

    return exp( -pow(px-mu,2)/2/pow(csigma,2) ) / sqrt(2*M_PI) / csigma;

    And now you can see that there are two subexpressions that don't have to
    be calculated on every call. pow(csigma,2) could be calculated once per
    the entire loop and passed into 'normdist' as a const double&. The same
    goes for sqrt(2*M_PI), which is just a constant you could define in the
    file scope and just use here.

    > }
    >
    > //this is the vector version. It takes as input a vector of double and gives the vector of corresponding values as output:
    >
    > vector<double> normdist(const vector<double>& vx,const double& mu,const double& csigma){
    > vector<double> gv(vx.size()) ; //allocates the space for the vector gv.
    > for (int i=0 ; i<vx.size() ; i++){


    While it might not matter in your particular case, get into the habit of
    not calling any function in your 'for' condition if you know that it's
    not going to change. I'm yet to see a compiler that knows to eliminate
    it. Try this:

    for (int i=0, s=vx.size(); i<s; ++i)

    > gv=normdist(vx,mu,csigma);
    > }



    Most modern compilers are good with indexing, but it's a function call
    for the vector, so you might want to try to eliminate it as well by
    using pointers:

    double* pgv = &gv[0];
    const double* pvx = &vx[0];

    for (...
    {
    *pgv++ = normdist(*pvx++, mu, csigma);
    }

    > return gv ;
    > }
    >
    > As I told I'm a beginner so I guess there is plenty of room for improving the speed and the quality of the code
    > (just writing to you I have realized that the initialization of gv might not be optimal, since I create values I don't need).


    Good luck!

    V
    --
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Feb 17, 2012
    #7
  8. Hi Juha

    Juha Nieminen wrote:

    > Asger Joergensen <> wrote:
    > > If You can live with fixed precision You code will run much faster if You use
    > > int's or __int64's and then multiply /devide as much as You need to get Your the
    > > precision You need.

    >
    > This isn't the early 90's. Processors have got a bit faster at
    > calculating with floating point values in the meantime.


    Not nearly as fast as they can do it with int's though, in my experience.

    Best regards
    Asger-P
    Asger Joergensen, Feb 17, 2012
    #8
  9. On Feb 17, 7:41 am, Luca Cerone <> wrote:
    > Dear all, I'm a beginner in C++ so I'm sorry if my question might seem silly.
    > I'm also new to the group so I just say hello to all the people in it.
    >
    > I've written a function that computes the Normal Distribution, but when comparing the time to create 1000 vectors
    > each with 1000 points I've the same performance than using the analogous Matlab built-in function.
    > I'd like to improve my code for speed.
    > I'm doing this mainly to learn how to write efficient C++ code, so pleasejust don't tell me
    > that library X and Y do this faster than my code, but help me understanding why and how to improve
    > my code.
    >
    > I need both a single value and a vector version of this function,
    > so this is how I have defined them.
    >
    > // this is the single point version of the function:
    > double normdist(const double& px,const double& mu,const double& csigma){
    >   // evaluating the formula taken fromhttp://en.wikipedia.org/wiki/Normal_distribution
    >   double  val = sqrt(2*M_PI)*csigma;
    >   val = 1/val;
    >   val = val*exp(-pow(px-mu,2)/(2*pow(csigma,2)));
    >   return val ;
    >
    > }
    >
    > //this is the vector version. It takes as input a vector of double and gives the vector of corresponding values as output:
    >
    > vector<double> normdist(const vector<double>& vx,const double& mu,const double& csigma){
    >   vector<double> gv(vx.size()) ; //allocates the space for the vector gv.
    >   for (int i=0 ; i<vx.size() ; i++){
    >     gv=normdist(vx,mu,csigma);
    >   }
    >   return gv ;
    >
    > }
    >
    > As I told I'm a beginner so I guess there is plenty of room for improvingthe speed and the quality of the code
    > (just writing to you I have realized that the initialization of gv might not be optimal, since I create values I don't need).
    >
    > Thank you very much to all of you!
    > Luca


    Are compiling with all of the proper optimization flags? Are you using
    MSVC, and have you disabled checked iterators?
    Joshua Maurice, Feb 17, 2012
    #9
  10. Re: Writing efficient normal distribution function (for learningpurposes).

    On 2/17/2012 6:01 PM, Joshua Maurice wrote:
    > On Feb 17, 7:41 am, Luca Cerone<> wrote:
    >> [..]
    >> I've written a function that computes the Normal Distribution, but when comparing the time to create 1000 vectors
    >> each with 1000 points I've the same performance than using the analogous Matlab built-in function.
    >> [.. code using vector and indexing snipped ..]

    >
    > Are compiling with all of the proper optimization flags? Are you using
    > MSVC, and have you disabled checked iterators?


    There seem to be no iterators in that code.

    If the code speed is the same as Matlab's, there is probably not much
    hope to significantly improve it. I doubt that Matlab people release
    their library/application with sub-par performance.

    V
    --
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Feb 18, 2012
    #10
  11. Paavo Helde <> wrote:

    (snip)

    > pow(x,2) should be replaced by x*x as mentioned by others, it is not
    > given that the optimizer is smart enough to do that by itself.


    I have complained about this before. Fortran compilers have been
    known to optimize x**2 for many many years. Being an operator,
    instead of a function, makes it more obvious.

    In addition, there is no pow(double, int) which would help in
    optimizing many other cases.

    -- glen
    glen herrmannsfeldt, Feb 18, 2012
    #11
  12. Luca Cerone

    Stefan Ram Guest

    Stefan Ram, Feb 18, 2012
    #12
  13. Luca Cerone

    Miles Bader Guest

    glen herrmannsfeldt <> writes:
    >> pow(x,2) should be replaced by x*x as mentioned by others, it is not
    >> given that the optimizer is smart enough to do that by itself.

    >
    > I have complained about this before. Fortran compilers have been
    > known to optimize x**2 for many many years. Being an operator,
    > instead of a function, makes it more obvious.
    >
    > In addition, there is no pow(double, int) which would help in
    > optimizing many other cases.


    All the C++ compilers I've used _do_ optimize "pow(x,2)" to "x*x"...
    Not very hard as optimizations go...

    -miles

    --
    Saa, shall we dance? (from a dance-class advertisement)
    Miles Bader, Feb 19, 2012
    #13
  14. Luca Cerone

    lucacerone Guest

    Hi Asger,
    this solution is not feasible for me unfortunately because the function I want to approximate is nonlinear. Thanks a lot in any case for the advice.
    lucacerone, Feb 19, 2012
    #14
  15. Luca Cerone

    lucacerone Guest

    First of all, thank you all for your advices.
    Many issues have been arisen, so I try to answer to them in this post.

    For the performance I have evaluated them evaluating vectors made of 1000 points, for one hundred thousand times. So I think it is a reliable measurement.

    As for the compiler optimization, I'm using g++ (GNU compiler) on a Ubuntu machine. The way I compile the code is simply:

    g++ -c -o mylib.o mylib.cpp
    g++ -c -o main.o main.cpp
    g++ -o executableneme main.o mylib.o

    I have absolutely no idea on how to optimize compilation...

    Amongst the other suggestions I think I've read to substitute pow(x,2) with x*x. I'll certainly do that.
    Also, it is true that in the vector version of the function certain calculations can be made only once. I'll change that as well.

    Victor suggested to use pointers rather than indexing
    I have to be honest, while studying C++ I haven't arrived to the
    pointers chapter yet, that's why I haven't used them... and that's why I haven't fully understood your answer.
    Also I don't understand why accessing with pointers might be faster than direct indexing.

    I'm sorry if I'm asking easy question, I really am a novice and need to learn a lot :)

    Cheers, Luca
    lucacerone, Feb 19, 2012
    #15
  16. Hi lucacerone

    lucacerone wrote:

    > Victor suggested to use pointers rather than indexing
    > I have to be honest, while studying C++ I haven't arrived to the
    > pointers chapter yet, that's why I haven't used them... and that's why I haven't
    > fully understood your answer.



    Have a look in this thread in this NG:
    Re: loop optimization 01.19.01

    > Also I don't understand why accessing with pointers might be faster than direct
    > indexing.


    When You say direct indexing I don't think there is much, if any, difference,
    when it is a single array, but when You have [][] or [][][] there start to be
    a difference, but it all depends on the compiler and various little things.
    If You use vector and its indexing there is a 30% difference, at least when
    compiled with my CodeGear 2009 compiler, I don't know about gcc, but You can
    easily test it Your self, You can use the example from the thread i mentioned
    above.
    P.s. Do remember that most of the time the indexing is a very little part of
    the time used, in very many cases You will not be able to see the difference
    in the total picture.


    > I'm sorry if I'm asking easy question, I really am a novice and need to learn a lot :)


    No problem in, we all learn more by communicating, reading is good,
    but communicating is better, thats my opinion.

    Best regards
    Asger-P
    Asger Joergensen, Feb 19, 2012
    #16
  17. Luca Cerone

    Miles Bader Guest

    lucacerone <> writes:
    > As for the compiler optimization, I'm using g++ (GNU compiler) on a
    > Ubuntu machine. The way I compile the code is simply:
    >
    > g++ -c -o mylib.o mylib.cpp
    > g++ -c -o main.o main.cpp
    > g++ -o executableneme main.o mylib.o
    >
    > I have absolutely no idea on how to optimize compilation...


    As a start, you should turn on optimization -- the compiler does no
    optimization unless you tell it to, and in many cases non-optimized code
    is much, _much_ slower.

    To turn on optimization, use the "-O" option, optionally followed by a
    digit giving the level of optimization; with gcc, the most common choice
    is "-O2" which optimizes somewhat more.

    You also want to tell it what processor you're using, because by default
    it will produce "generic" code that can run on almost any processor in
    the same family, but can be significantly slower than code which is
    precisely targetted at the processor you're using. To do this, use the
    "-march=..." option; typically you can use "-march=native", which says
    "produce code targetted at the processor in this machine."

    So i'd suggest:

    g++ -c -o mylib.o -O2 -march=native mylib.cpp
    g++ -c -o main.o -O2 -march=native main.cpp
    g++ -o executableneme main.o mylib.o

    -Miles

    --
    Infancy, n. The period of our lives when, according to Wordsworth, 'Heaven
    lies about us.' The world begins lying about us pretty soon afterward.
    Miles Bader, Feb 19, 2012
    #17
  18. On Feb 19, 6:09 am, Miles Bader <> wrote:
    > So i'd suggest:
    >
    >   g++ -c -o mylib.o -O2 -march=native mylib.cpp
    >   g++ -c -o main.o -O2 -march=native  main.cpp
    >   g++ -o executableneme main.o mylib.o


    Why no -O3?

    http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options

    -O3
    Optimize yet more. -O3 turns on all optimizations specified by -O2
    and also turns on the -finline-functions, -funswitch-loops, -
    fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize and -fipa-
    cp-clone options.


    Function inlining is pretty important for C++.
    Joshua Maurice, Feb 19, 2012
    #18
  19. Luca Cerone

    Marc Guest

    Joshua Maurice wrote:

    > On Feb 19, 6:09 am, Miles Bader <> wrote:
    >> So i'd suggest:
    >>
    >>   g++ -c -o mylib.o -O2 -march=native mylib.cpp
    >>   g++ -c -o main.o -O2 -march=native  main.cpp
    >>   g++ -o executableneme main.o mylib.o

    >
    > Why no -O3?
    >
    > http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
    >
    > -O3
    > Optimize yet more. -O3 turns on all optimizations specified by -O2
    > and also turns on the -finline-functions, -funswitch-loops, -
    > fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize and -fipa-
    > cp-clone options.


    It is fairly safe that -O2 will generate better code than -O1 (which
    generates better code than -O0). With -O3, things are much less clear.
    Sometimes you get better results, but it is still quite common that
    -O3 code runs slower than -O2. The generated code is often much larger
    (and the compilation takes longer).

    > Function inlining is pretty important for C++.


    There is already some inlining at -O2 (exactly how much depends on the
    version). And -O3 is not always enough to get the useful inlining,
    sometimes you need -fprofile-generate/use or -finline-limit=... to
    convince gcc.

    This is not to say that -O3 should not be tried, of course. And
    possibly -ffast-math if the OP is happy with imprecise (aka wrong)
    results.
    Marc, Feb 19, 2012
    #19
  20. Luca Cerone

    Miles Bader Guest

    Joshua Maurice <> writes:
    >> So i'd suggest:
    >>
    >>   g++ -c -o mylib.o -O2 -march=native mylib.cpp
    >>   g++ -c -o main.o -O2 -march=native  main.cpp
    >>   g++ -o executableneme main.o mylib.o

    >
    > Why no -O3?
    >
    > http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
    >
    > -O3
    > Optimize yet more. -O3 turns on all optimizations specified by -O2
    > and also turns on the -finline-functions, -funswitch-loops, -
    > fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize and -fipa-
    > cp-clone options.
    >
    > Function inlining is pretty important for C++.


    -O2 does comprehensive inlining as well -- including functions not
    marked "inline" -- using gcc's inlining heuristics, which are quite
    good these days. [What "-finline-functions" does is act as _every
    function_ were marked "inline" meaning gcc will be more likely to
    inline stuff, even when it doesn't think it's a good idea.]

    -O2 is a good default choice if want good optimization, but don't want
    to spend a lot of time benchmarking. -O3 includes optimizations that
    may be very expensive at compile-time or may result in excessive code
    bloat (and thus can have a negative effect on the icache etc). -O3
    can _sometimes_ help, but it can also sometimes hurt, so it's a good
    idea to do some benchmarking to determine which is better.

    -Miles

    --
    Cynic, n. A blackguard whose faulty vision sees things as they are, not as
    they ought to be. Hence the custom among the Scythians of plucking out a
    cynic's eyes to improve his vision.
    Miles Bader, Feb 20, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tan Thuan Seah
    Replies:
    3
    Views:
    11,181
    Cy Edmunds
    Oct 14, 2004
  2. Yudan YI \(OSU\)
    Replies:
    5
    Views:
    16,302
    Dietmar Kuehl
    Feb 25, 2005
  3. PCHOME
    Replies:
    3
    Views:
    2,156
    Lawrence Kirby
    Jun 23, 2005
  4. Tim Golden
    Replies:
    9
    Views:
    289
  5. Shawn Milochik
    Replies:
    8
    Views:
    291
    Steve Holden
    Jan 15, 2010
Loading...

Share This Page