finecurwrote:
...
You have a number of issues:
a) you're not compiling with omp enabled - use -fopenmp
b) your computation is not long enough to meausure adequately - need to
have at least seconds - preferably more than 5.
c) if you did make N large enough it would be too big for your stack
d) clock() has no where near enough resolution to measure the relative
performance of either version.
e) sharing sum and computing it on every iteration would cause alot of
contention on the shared add every iteration - best to remove that.
f) This is comp.lang.c++ so use C++ !!!
Try this code:
to compile withoutout openmp use:
g++ -O3 -march=x86-64 -mfpmath=sse -funroll-loops -fomit-frame-pointer
xxx_openmp3.cpp -o xxx_openmp3
.. and with openmp use:
g++ -O3 -march=x86-64 -mfpmath=sse -funroll-loops -fomit-frame-pointer
xxx_openmp3.cpp -o xxx_openmp3 -fopenmp
==========
#include <omp.h>
#include <cmath>
#include <iostream>
#define CHUNKSIZE 100000
#define NX 10000000
#define NUM 1
template <typename T>
T compute(T a, T b)
{
return std::sin(a) * std::sin(b);
}
template <typename T, int N>
struct Compute
{
T a[N], b[N], c[N], sum;
Compute()
: sum()
{
for (int i=0; i < N; i++){
a
= b = i * 1.0;
}
}
void DoWork()
{
int i, chunk = CHUNKSIZE;
// maintain references to arrays - speeds things up a tad
T (&la)[N] = a;
T (&lb)[N] = b;
T (&lc)[N] = b;
T lsum = T();
sum = T();
#pragma omp parallel shared(chunk) private(i,lsum)
{
#pragma omp for schedule(dynamic,chunk) nowait
for (i = 0; i < N; i++){
lsum += lc = compute(la, lb);
}
#pragma omp critical
{
sum += lsum;
}
}
}
};
Compute<double,NX> nx;
int main ()
{
nx.DoWork();
std::cout.precision(16);
std::cout << "sum = " << nx.sum << "\n";}
==========
time ./xxx_openmp3
sum = 5000000.034065126
real 0m0.992s
user 0m1.588s
sys 0m0.152s
.... without -fopenmp
time ./xxx_openmp3
sum = 5000000.034065164
real 0m1.806s
user 0m1.632s
sys 0m0.152s
... note the result is different due to round off !!! It will even
change every run because the order of the summation is different. You
can fix this but you need to save the intermediate sums. In theory, you
should really add the values in order starting at the ones with the
smallest absolute value which requires sorting the values in array c.
Here is what you get now:
./xxx_openmp3 & ./xxx_openmp3 & ./xxx_openmp3 ; sleep 3
[1] 30411
[2] 30412
sum = 5000000.034065094
sum =5000000.03406521
sum = 5000000.034065165
...