Writing efficient normal distribution function (for learning purposes).

Miles Bader · Feb 19, 2012

Marc said:
This is not to say that -O3 should not be tried, of course. And
possibly -ffast-math if the OP is happy with imprecise (aka wrong)
results.

.... and if "-ffast-math" isn't acceptable itself, it's often still
worth looking at the sub-options which -ffast-math turns on (look in
the gcc manual, it gives a good description of what it does).
"-fno-math-errno" in particular can give some nice speedups by
dropping standards requirements that very few programs actually care
about.

Similarly, one of the options -ffast-math turns on,
"-funsafe-math-optimizations" (which can often yield good speedups)
actually has useful sub-options itself. Of these "-fno-trapping-math"
is not problematic for most programs, and can be very useful, as it
allows gcc to speculate expensive operations like division (because it
knows that they can't result in an exception).

[What I do is use "-ffast-math" and then _disable_ the individual
sub-options that I don't want...]

-miles

Juha Nieminen · Feb 20, 2012

Marc said:
And
possibly -ffast-math if the OP is happy with imprecise (aka wrong)
results.

How are the results imprecise/wrong when using -ffast-math?

Granted, I don't know every single detail, but I have the impression
that what it does is that it skips some of the most pedantic rules of
IEEE floating point math in favor of faster code. These minute pedantic
rules do not affect the accuracy of the end result in any significant
way (more than what can be expected from floating point values anyways).

(I do not know if/how it affects situations where undefined results
are produced, such as NaNs or INFs, or in overflow/underflow situations.
In those cases, however, I don't think -ffast-math is going to screw up
your program, unless it uses non-standard code to specifically check for
those situations.)

Can you provide an example where -ffast-math produces a clearly incorrect
result?

(Triggering a *bug* in gcc does not count. I have personally reported
one such bug, and it was fixed.)

lucacerone · Feb 20, 2012

Thanks Miles, this really improved the speed (doubled it)!
Where can I learn all the tricks I might need about the compiler?

lucacerone · Feb 20, 2012

Thanks to all of you guys, I have had some nice advice from this post!
I do some numerical simulations, but I'm interested in qualitative
rather than quantitative results.
I'd like to know how inaccurate are the results using those math optimization options you were mentioning.

Also, I'd like to know how you would have written those two functions in order to make them faster.
I have restraint to use vectors, nor any specific type.
I have used vectors because this is what I've read in Stroustrup's book...
I haven't used pointers because I still haven't understood exactly their algebra.
I have passed values by reference to the function because I thought it might be better for memory.

So just to say, I'm trying to learn the various alternative to write function
that are both easy to re-use and fast to evaluate.
So if you think my version can be entirely rewritten, if you have time can you help me understand how???

Thanks a lot for your help again!!!

P.s. slightly OT, some of you could use some markup in their message..
where can I find a list so that I can use it as well?

Miles Bader · Feb 20, 2012

Juha Nieminen said:
How are the results imprecise/wrong when using -ffast-math?

Granted, I don't know every single detail, but I have the impression
that what it does is that it skips some of the most pedantic rules of
IEEE floating point math in favor of faster code. These minute pedantic
rules do not affect the accuracy of the end result in any significant
way (more than what can be expected from floating point values anyways).

It's not that -ffast-math is bad or anything (I use it happily
myself), but if precise results are important it's a good idea to
check what it does in detail, and make sure things are OK. One can
always, of course, use -ffast-math, and then _disable_ some of the
finer-grained options it enabled.

For instance: -ffast-math enables -funsafe-math-optimizations, which
enables -freciprocal-math, which: "Allows the reciprocal of a value to
be used instead of dividing by the value if this enables optimizations
.... Note that this loses precision and increases the number of flops
operating on the value."

-funsafe-math-optimizations also enables fassociative-math, and
changing the order of operations can affect the precision of the
result in a way that the programmer didn't expect.

-ffast-math also enables -ffinite-math-only, which of course will
cause infinities etc to not be handled as expected.

etc.

-miles

gwowen · Feb 20, 2012

Dear all, I'm a beginner in C++ so I'm sorry if my question might seem silly.
I'm also new to the group so I just say hello to all the people in it.

I've written a function that computes the Normal Distribution, but when comparing the time
to create 1000 vectors each with 1000 points I've the same performance than using the
analogous Matlab built-in function.

As an aside, note that what you're calculating is what Matlab calls
normpdf().

It's a reasonable thing to calculate, but unless you're plotting a
curve, its not actually a lot of use. You're more likely to need to
emulate matlab's randn() [which generates a vector of random values
from a normal distribution] or normcdf() [the cumulative probability
distribution, or "area under the normal curve"].

Dombo · Feb 20, 2012

Op 20-Feb-12 11:24, Juha Nieminen schreef:
Floating point math is (almost) always imprecise, whether that means it
that it wrong depends very much on how precision is needed by the
application.

How are the results imprecise/wrong when using -ffast-math?

Granted, I don't know every single detail, but I have the impression
that what it does is that it skips some of the most pedantic rules of
IEEE floating point math in favor of faster code.

In the old days when the x87 instructions on x86 platforms were used for
floating point math, you could tell many compilers to keep intermediate
results in the 80-bit floating point registers (fast), or to always read
and write intermediate results as 64-bit floating point numbers to
memory (slow). When using the first option you never really know when an
operation is performed with 80-bit precision and at what point it gets
rounded down to 64-bits. With the second option you know that the result
is always rounded down to 64-bits at every step.

In other words; it is not so much about precision, but more about
getting consistent results (there is a difference).

Now SSE instructions (which do not have 80-bit registers) are commonly
used for floating point math on x86 platforms I'm not quite sure how
this settings would affect floating point math. I believe that the SSE
instruction do deviate in some aspects from the IEEE 754 standard, so
some additional code may be needed to fix this for those who care.

Juha Nieminen · Feb 21, 2012

Dombo said:
Now SSE instructions (which do not have 80-bit registers) are commonly
used for floating point math on x86 platforms I'm not quite sure how
this settings would affect floating point math.

As someone pointed out, something like this:

x = a / c;
y = b / c;

does not (usually) give the *exact* same result as:

tmp = 1 / c;
x = a * tmp;
y = b * tmp;

The difference will happen in the few least-significant bits of the
mantissa (iow. somewhere around the 15th least-significant digit with
doubles), which is extremely small, but might count in some rare
circumstances.

The latter, however, is often faster than the former (because floating
point multiplication will usually be something like 1 clock cycle while
division will be a dozen or so).

88888 Dihedral · Feb 21, 2012

Juha Nieminenæ–¼ 2012å¹´2æœˆ21æ—¥æ˜ŸæœŸäºŒUTC+8ä¸‹åˆ4æ™‚41åˆ†14ç§’å¯«é“ï¼š

As someone pointed out, something like this:

x = a / c;
y = b / c;

does not (usually) give the *exact* same result as:

tmp = 1 / c;
x = a * tmp;
y = b * tmp;

The difference will happen in the few least-significant bits of the
mantissa (iow. somewhere around the 15th least-significant digit with
doubles), which is extremely small, but might count in some rare
circumstances.

The latter, however, is often faster than the former (because floating
point multiplication will usually be something like 1 clock cycle while
division will be a dozen or so).

Juha Nieminenæ–¼ 2012å¹´2æœˆ21æ—¥æ˜ŸæœŸäºŒUTC+8ä¸‹åˆ4æ™‚41åˆ†14ç§’å¯«é“ï¼š

As someone pointed out, something like this:

x = a / c;
y = b / c;

does not (usually) give the *exact* same result as:

tmp = 1 / c;

tmp is trucated for finite precision here,
if err(tmp, tru(1/c)) *a affects the tail bits of a*tmp,
then the result is different from a/c computed
by a float or double instruction

lucacerone · Feb 21, 2012

Hi Gwowen, thanks but I actually want to implement the analogous of normpdf in matlab

As I said I have no specific tasks to make, just learn how to write efficient cod

Learning Regex looking for criticism	3	Jan 12, 2025
Chatbot	0	Oct 8, 2024
Using `transform_reduce` correctly translating Javascript code to C++	0	Jun 11, 2024
Javascript set language function issue	2	Nov 24, 2024
I HAVE MADE AN IMPROVED INPUT FOR INTEGERS ONLY	3	Oct 28, 2024
How many times will a template function be instantiated for the same parameter?	4	May 27, 2013
Function is not worked in C	2	Jun 26, 2023
Chatbox for website	0	Oct 16, 2024

Writing efficient normal distribution function (for learning purposes).

Miles Bader

Juha Nieminen

lucacerone

lucacerone

Miles Bader

gwowen

Dombo

Juha Nieminen

88888 Dihedral

lucacerone

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads