# Writing efficient normal distribution function (for learning purposes).

Discussion in 'C++' started by Luca Cerone, Feb 17, 2012.

1. ### Luca CeroneGuest

Dear all, I'm a beginner in C++ so I'm sorry if my question might seem silly.
I'm also new to the group so I just say hello to all the people in it.

I've written a function that computes the Normal Distribution, but when comparing the time to create 1000 vectors
each with 1000 points I've the same performance than using the analogous Matlab built-in function.
I'd like to improve my code for speed.
I'm doing this mainly to learn how to write efficient C++ code, so please just don't tell me
that library X and Y do this faster than my code, but help me understanding why and how to improve
my code.

I need both a single value and a vector version of this function,
so this is how I have defined them.

// this is the single point version of the function:
double normdist(const double& px,const double& mu,const double& csigma){
// evaluating the formula taken from http://en.wikipedia.org/wiki/Normal_distribution
double val = sqrt(2*M_PI)*csigma;
val = 1/val;
val = val*exp(-pow(px-mu,2)/(2*pow(csigma,2)));
return val ;
}

//this is the vector version. It takes as input a vector of double and gives the vector of corresponding values as output:

vector<double> normdist(const vector<double>& vx,const double& mu,const double& csigma){
vector<double> gv(vx.size()) ; //allocates the space for the vector gv.
for (int i=0 ; i<vx.size() ; i++){
gv=normdist(vx,mu,csigma);
}
return gv ;
}

As I told I'm a beginner so I guess there is plenty of room for improving the speed and the quality of the code
(just writing to you I have realized that the initialization of gv might not be optimal, since I create values I don't need).

Thank you very much to all of you!
Luca

Luca Cerone, Feb 17, 2012

2. ### Asger JoergensenGuest

Hi Luca

If You can live with fixed precision You code will run much faster if You use
int's or __int64's and then multiply /devide as much as You need to get Your the
precision You need.

Best regards
Asger-P

Asger Joergensen, Feb 17, 2012

3. ### lucaceroneGuest

Let me understand if I've understood.. by fixed precision you mean having
number with a fixed number of digits? If so, yes I can live with it
as far as I have 8 exact decimal points.

> If You can live with fixed precision You code will run much faster if You use
> int's or __int64's and then multiply /devide as much as You need to get Your the
> precision You need.

I'm sorry but I don't understand you. How would you suggest to use ints?
What type is it __int64?

Thanks a lot for you help in any case,
Luca

lucacerone, Feb 17, 2012
4. ### Asger JoergensenGuest

Hi lucacerone

lucacerone wrote:

> Let me understand if I've understood.. by fixed precision you mean having
> number with a fixed number of digits? If so, yes I can live with it
> as far as I have 8 exact decimal points.

int64_t val = DoubleVal * 100000000;

do the rest of the work with the int and then You divide by the same number
when You need the result.

> > If You can live with fixed precision You code will run much faster if You use
> > int's or __int64's and then multiply /devide as much as You need to get Your the
> > precision You need.

>
> I'm sorry but I don't understand you. How would you suggest to use ints?
> What type is it __int64?

64 bit integer my borland compiler like __int64, but I think int64_t is the
cross compiler way to write it.

I'm not into math so I don't know if You can use this method for Your project.

I do a lot of gradient graphics, so when I calculate color steps i use this
method and it is a lot faster then using floating point values like float and
double.

Best regards
Asger-P

Asger Joergensen, Feb 17, 2012
5. ### Juha NieminenGuest

Asger Joergensen <> wrote:
> If You can live with fixed precision You code will run much faster if You use
> int's or __int64's and then multiply /devide as much as You need to get Your the
> precision You need.

This isn't the early 90's. Processors have got a bit faster at
calculating with floating point values in the meantime.

Juha Nieminen, Feb 17, 2012
6. ### Juha NieminenGuest

Luca Cerone <> wrote:
> I'd like to improve my code for speed.

First of all, are you sure you are compiling with optimizations / in
release mode? A surprisingly common mistake to not to do that.

> double val = sqrt(2*M_PI)*csigma;

A compiler might or might not be able to calculate that "sqrt(2*M_PI)"
at compile time (because compilers oftentimes do not perform floating point
calculations at compile time due to a myriad of reasons; of course it
depends on the compiler and its version). If you want a few clock cycles
off, might be a safer bet to write the value directly.

> val = val*exp(-pow(px-mu,2)/(2*pow(csigma,2)));

Likewise a compiler might or might not be able to optimize a
"pow(x,2)" into "x*x". It may be a safer bet to write it explicitly.
(std:ow is significantly slower than multiplication.)

> vector<double> normdist(const vector<double>& vx,const double& mu,const double& csigma){
> vector<double> gv(vx.size()) ; //allocates the space for the vector gv.

How many times is this function called in your program? Are you
measuring its speed by calling this function thousands/millions of
times and seeing how long it takes overall?

If yes, then there's your major problem. Memory allocation is
(unfortunately) a slow operation in C++ (not really C++'s fault,
but that's another story). You might want to pass the return value
as a reference parameter to the function and have it write there
instead of allocating a new vector each time. Then you can reuse the
same vector object for each call, avoiding the memory allocation
(except for the very first time it's called, of course).

Juha Nieminen, Feb 17, 2012
7. ### Victor BazarovGuest

Re: Writing efficient normal distribution function (for learningpurposes).

On 2/17/2012 10:41 AM, Luca Cerone wrote:
> Dear all, I'm a beginner in C++ so I'm sorry if my question might seem silly.
> I'm also new to the group so I just say hello to all the people in it.
>
> I've written a function that computes the Normal Distribution, but when comparing the time to create 1000 vectors
> each with 1000 points I've the same performance than using the analogous Matlab built-in function.
> I'd like to improve my code for speed.
> I'm doing this mainly to learn how to write efficient C++ code, so please just don't tell me
> that library X and Y do this faster than my code, but help me understanding why and how to improve
> my code.
>
> I need both a single value and a vector version of this function,
> so this is how I have defined them.
>
> // this is the single point version of the function:
> double normdist(const double& px,const double& mu,const double& csigma){
> // evaluating the formula taken from http://en.wikipedia.org/wiki/Normal_distribution
> double val = sqrt(2*M_PI)*csigma;
> val = 1/val;
> val = val*exp(-pow(px-mu,2)/(2*pow(csigma,2)));
> return val ;

Some compilers are actually good at parsing expressions and converting
them into efficient machine code. You could try simply writing

return exp( -pow(px-mu,2)/2/pow(csigma,2) ) / sqrt(2*M_PI) / csigma;

And now you can see that there are two subexpressions that don't have to
be calculated on every call. pow(csigma,2) could be calculated once per
the entire loop and passed into 'normdist' as a const double&. The same
goes for sqrt(2*M_PI), which is just a constant you could define in the
file scope and just use here.

> }
>
> //this is the vector version. It takes as input a vector of double and gives the vector of corresponding values as output:
>
> vector<double> normdist(const vector<double>& vx,const double& mu,const double& csigma){
> vector<double> gv(vx.size()) ; //allocates the space for the vector gv.
> for (int i=0 ; i<vx.size() ; i++){

While it might not matter in your particular case, get into the habit of
not calling any function in your 'for' condition if you know that it's
not going to change. I'm yet to see a compiler that knows to eliminate
it. Try this:

for (int i=0, s=vx.size(); i<s; ++i)

> gv=normdist(vx,mu,csigma);
> }

Most modern compilers are good with indexing, but it's a function call
for the vector, so you might want to try to eliminate it as well by
using pointers:

double* pgv = &gv[0];
const double* pvx = &vx[0];

for (...
{
*pgv++ = normdist(*pvx++, mu, csigma);
}

> return gv ;
> }
>
> As I told I'm a beginner so I guess there is plenty of room for improving the speed and the quality of the code
> (just writing to you I have realized that the initialization of gv might not be optimal, since I create values I don't need).

Good luck!

V
--

Victor Bazarov, Feb 17, 2012
8. ### Asger JoergensenGuest

Hi Juha

Juha Nieminen wrote:

> Asger Joergensen <> wrote:
> > If You can live with fixed precision You code will run much faster if You use
> > int's or __int64's and then multiply /devide as much as You need to get Your the
> > precision You need.

>
> This isn't the early 90's. Processors have got a bit faster at
> calculating with floating point values in the meantime.

Not nearly as fast as they can do it with int's though, in my experience.

Best regards
Asger-P

Asger Joergensen, Feb 17, 2012
9. ### Joshua MauriceGuest

On Feb 17, 7:41 am, Luca Cerone <> wrote:
> Dear all, I'm a beginner in C++ so I'm sorry if my question might seem silly.
> I'm also new to the group so I just say hello to all the people in it.
>
> I've written a function that computes the Normal Distribution, but when comparing the time to create 1000 vectors
> each with 1000 points I've the same performance than using the analogous Matlab built-in function.
> I'd like to improve my code for speed.
> I'm doing this mainly to learn how to write efficient C++ code, so pleasejust don't tell me
> that library X and Y do this faster than my code, but help me understanding why and how to improve
> my code.
>
> I need both a single value and a vector version of this function,
> so this is how I have defined them.
>
> // this is the single point version of the function:
> double normdist(const double& px,const double& mu,const double& csigma){
>   // evaluating the formula taken fromhttp://en.wikipedia.org/wiki/Normal_distribution
>   double  val = sqrt(2*M_PI)*csigma;
>   val = 1/val;
>   val = val*exp(-pow(px-mu,2)/(2*pow(csigma,2)));
>   return val ;
>
> }
>
> //this is the vector version. It takes as input a vector of double and gives the vector of corresponding values as output:
>
> vector<double> normdist(const vector<double>& vx,const double& mu,const double& csigma){
>   vector<double> gv(vx.size()) ; //allocates the space for the vector gv.
>   for (int i=0 ; i<vx.size() ; i++){
>     gv=normdist(vx,mu,csigma);
>   }
>   return gv ;
>
> }
>
> As I told I'm a beginner so I guess there is plenty of room for improvingthe speed and the quality of the code
> (just writing to you I have realized that the initialization of gv might not be optimal, since I create values I don't need).
>
> Thank you very much to all of you!
> Luca

Are compiling with all of the proper optimization flags? Are you using
MSVC, and have you disabled checked iterators?

Joshua Maurice, Feb 17, 2012
10. ### Victor BazarovGuest

Re: Writing efficient normal distribution function (for learningpurposes).

On 2/17/2012 6:01 PM, Joshua Maurice wrote:
> On Feb 17, 7:41 am, Luca Cerone<> wrote:
>> [..]
>> I've written a function that computes the Normal Distribution, but when comparing the time to create 1000 vectors
>> each with 1000 points I've the same performance than using the analogous Matlab built-in function.
>> [.. code using vector and indexing snipped ..]

>
> Are compiling with all of the proper optimization flags? Are you using
> MSVC, and have you disabled checked iterators?

There seem to be no iterators in that code.

If the code speed is the same as Matlab's, there is probably not much
hope to significantly improve it. I doubt that Matlab people release
their library/application with sub-par performance.

V
--

Victor Bazarov, Feb 18, 2012
11. ### glen herrmannsfeldtGuest

Paavo Helde <> wrote:

(snip)

> pow(x,2) should be replaced by x*x as mentioned by others, it is not
> given that the optimizer is smart enough to do that by itself.

known to optimize x**2 for many many years. Being an operator,
instead of a function, makes it more obvious.

In addition, there is no pow(double, int) which would help in
optimizing many other cases.

-- glen

glen herrmannsfeldt, Feb 18, 2012
12. ### Stefan RamGuest

Stefan Ram, Feb 18, 2012

glen herrmannsfeldt <> writes:
>> pow(x,2) should be replaced by x*x as mentioned by others, it is not
>> given that the optimizer is smart enough to do that by itself.

>
> known to optimize x**2 for many many years. Being an operator,
> instead of a function, makes it more obvious.
>
> In addition, there is no pow(double, int) which would help in
> optimizing many other cases.

All the C++ compilers I've used _do_ optimize "pow(x,2)" to "x*x"...
Not very hard as optimizations go...

-miles

--

14. ### lucaceroneGuest

Hi Asger,
this solution is not feasible for me unfortunately because the function I want to approximate is nonlinear. Thanks a lot in any case for the advice.

lucacerone, Feb 19, 2012
15. ### lucaceroneGuest

Many issues have been arisen, so I try to answer to them in this post.

For the performance I have evaluated them evaluating vectors made of 1000 points, for one hundred thousand times. So I think it is a reliable measurement.

As for the compiler optimization, I'm using g++ (GNU compiler) on a Ubuntu machine. The way I compile the code is simply:

g++ -c -o mylib.o mylib.cpp
g++ -c -o main.o main.cpp
g++ -o executableneme main.o mylib.o

I have absolutely no idea on how to optimize compilation...

Amongst the other suggestions I think I've read to substitute pow(x,2) with x*x. I'll certainly do that.
Also, it is true that in the vector version of the function certain calculations can be made only once. I'll change that as well.

Victor suggested to use pointers rather than indexing
I have to be honest, while studying C++ I haven't arrived to the
pointers chapter yet, that's why I haven't used them... and that's why I haven't fully understood your answer.
Also I don't understand why accessing with pointers might be faster than direct indexing.

I'm sorry if I'm asking easy question, I really am a novice and need to learn a lot

Cheers, Luca

lucacerone, Feb 19, 2012
16. ### Asger JoergensenGuest

Hi lucacerone

lucacerone wrote:

> Victor suggested to use pointers rather than indexing
> I have to be honest, while studying C++ I haven't arrived to the
> pointers chapter yet, that's why I haven't used them... and that's why I haven't

Have a look in this thread in this NG:
Re: loop optimization 01.19.01

> Also I don't understand why accessing with pointers might be faster than direct
> indexing.

When You say direct indexing I don't think there is much, if any, difference,
when it is a single array, but when You have [][] or [][][] there start to be
a difference, but it all depends on the compiler and various little things.
If You use vector and its indexing there is a 30% difference, at least when
compiled with my CodeGear 2009 compiler, I don't know about gcc, but You can
easily test it Your self, You can use the example from the thread i mentioned
above.
P.s. Do remember that most of the time the indexing is a very little part of
the time used, in very many cases You will not be able to see the difference
in the total picture.

> I'm sorry if I'm asking easy question, I really am a novice and need to learn a lot

but communicating is better, thats my opinion.

Best regards
Asger-P

Asger Joergensen, Feb 19, 2012

lucacerone <> writes:
> As for the compiler optimization, I'm using g++ (GNU compiler) on a
> Ubuntu machine. The way I compile the code is simply:
>
> g++ -c -o mylib.o mylib.cpp
> g++ -c -o main.o main.cpp
> g++ -o executableneme main.o mylib.o
>
> I have absolutely no idea on how to optimize compilation...

As a start, you should turn on optimization -- the compiler does no
optimization unless you tell it to, and in many cases non-optimized code
is much, _much_ slower.

To turn on optimization, use the "-O" option, optionally followed by a
digit giving the level of optimization; with gcc, the most common choice
is "-O2" which optimizes somewhat more.

You also want to tell it what processor you're using, because by default
it will produce "generic" code that can run on almost any processor in
the same family, but can be significantly slower than code which is
precisely targetted at the processor you're using. To do this, use the
"-march=..." option; typically you can use "-march=native", which says
"produce code targetted at the processor in this machine."

So i'd suggest:

g++ -c -o mylib.o -O2 -march=native mylib.cpp
g++ -c -o main.o -O2 -march=native main.cpp
g++ -o executableneme main.o mylib.o

-Miles

--
Infancy, n. The period of our lives when, according to Wordsworth, 'Heaven
lies about us.' The world begins lying about us pretty soon afterward.

18. ### Joshua MauriceGuest

On Feb 19, 6:09 am, Miles Bader <> wrote:
> So i'd suggest:
>
>   g++ -c -o mylib.o -O2 -march=native mylib.cpp
>   g++ -c -o main.o -O2 -march=native  main.cpp
>   g++ -o executableneme main.o mylib.o

Why no -O3?

http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options

-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2
and also turns on the -finline-functions, -funswitch-loops, -
cp-clone options.

Function inlining is pretty important for C++.

Joshua Maurice, Feb 19, 2012
19. ### MarcGuest

Joshua Maurice wrote:

> On Feb 19, 6:09 am, Miles Bader <> wrote:
>> So i'd suggest:
>>
>>   g++ -c -o mylib.o -O2 -march=native mylib.cpp
>>   g++ -c -o main.o -O2 -march=native  main.cpp
>>   g++ -o executableneme main.o mylib.o

>
> Why no -O3?
>
> http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
>
> -O3
> Optimize yet more. -O3 turns on all optimizations specified by -O2
> and also turns on the -finline-functions, -funswitch-loops, -
> fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize and -fipa-
> cp-clone options.

It is fairly safe that -O2 will generate better code than -O1 (which
generates better code than -O0). With -O3, things are much less clear.
Sometimes you get better results, but it is still quite common that
-O3 code runs slower than -O2. The generated code is often much larger
(and the compilation takes longer).

> Function inlining is pretty important for C++.

There is already some inlining at -O2 (exactly how much depends on the
version). And -O3 is not always enough to get the useful inlining,
sometimes you need -fprofile-generate/use or -finline-limit=... to
convince gcc.

This is not to say that -O3 should not be tried, of course. And
possibly -ffast-math if the OP is happy with imprecise (aka wrong)
results.

Marc, Feb 19, 2012

Joshua Maurice <> writes:
>> So i'd suggest:
>>
>> Â  g++ -c -o mylib.o -O2 -march=native mylib.cpp
>> Â  g++ -c -o main.o -O2 -march=native Â main.cpp
>> Â  g++ -o executableneme main.o mylib.o

>
> Why no -O3?
>
> http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options
>
> -O3
> Optimize yet more. -O3 turns on all optimizations specified by -O2
> and also turns on the -finline-functions, -funswitch-loops, -
> fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize and -fipa-
> cp-clone options.
>
> Function inlining is pretty important for C++.

-O2 does comprehensive inlining as well -- including functions not
marked "inline" -- using gcc's inlining heuristics, which are quite
good these days. [What "-finline-functions" does is act as _every
function_ were marked "inline" meaning gcc will be more likely to
inline stuff, even when it doesn't think it's a good idea.]

-O2 is a good default choice if want good optimization, but don't want
to spend a lot of time benchmarking. -O3 includes optimizations that
may be very expensive at compile-time or may result in excessive code
bloat (and thus can have a negative effect on the icache etc). -O3
can _sometimes_ help, but it can also sometimes hurt, so it's a good
idea to do some benchmarking to determine which is better.

-Miles

--
Cynic, n. A blackguard whose faulty vision sees things as they are, not as
they ought to be. Hence the custom among the Scythians of plucking out a
cynic's eyes to improve his vision.