Patricia said:
The squares of (b[j]-b[k]) and (b[k]-b[j]) are equal. In terms of Chris'
sample code, once Math.exp(f * incr * incr) has been calculated, it
should be added to both Array[j] and Array[k].
Nice.
With the code now reading:
=============
double f = -0.5 / d / d;
for (int j = m; --j >= 0
{
double inputJ = input[j];
double sumJ = 1.0;
for (int k = m; --k > j
{
double diff = input[k]-inputJ;
double incr = Math.exp(f * diff * diff);
sumJ += incr;
output[k] += incr;
}
output[j] = sumJ;
}
for (int j = m; --j >= 0
output[j] /= c;
=============
(Where I've renamed the original b[] and Array[] to input[] and output[]
respectively. I'm being explicit about assuming the output is initially
zeroed. If that's shouldn't be true then the OP could either add my output[]
to his Array[], or re-instate the division by c that I've moved out of the
inner loop.)
BTW, by initialising sumJ to 1, and terminating the loop one step too early we
can avoid doing exp() when j==k.
With the conditions as before, m=10000, the run time comes down to about 12.5
seconds for this new code. Scaling up to m=150000 should now run in about 50
minutes (on my machine) assuming that cache effects do not seriously affect the
scaling (I have no evidence either way, and am not about to try a 50 minute run
just to find out ;-)
The output[] array is in fact palindromic, but the only algorithm I've been
able to think of to take advantage of the fact is the following very messy
code:
=============
double f = -0.5 / d / d;
int halfway = (m - 1) / 2;
for (int j = m; --j >= halfway
{
double inputJ = input[j];
double sumJ = 1.0;
for (int k = m; --k > j
{
double diff = input[k]-inputJ;
double incr = Math.exp(f * diff * diff);
sumJ += incr;
output[k] += incr;
}
output[j] = sumJ;
}
for (int j = halfway; --j >= 0
{
double inputJ = input[j];
double sumJ = 1.0;
int stop = Math.max(j, halfway-1);
for (int k = m; --k > stop
{
double diff = input[k]-inputJ;
double incr = Math.exp(f * diff * diff);
output[k] += incr;
}
}
for (int j = m; --j >= halfway
output[j] /= c;
for (int j = 0; j < halfway; j++)
output[j] = output[m-j-1];
=============
I have a lot of trouble with fencepost errors at the best of times, and in code
like the above I have /no/ confidence that I've got the details right (though
it does /appear/ to work correctly). Anyway, running that with m=10000 takes
8.5 seconds, which would suggest that my machine might be able to complete the
m=150K case in around half an hour.
-- chris