Explanation requested for this speed increase

V

velthuijsen

I have a function that before I modified it took around 13.75 seconds
to complete after the modification it took .325 seconds to complete.

the function header:
(Point **Input, size_t InputSize, Point *Output, size_t OutputSize);

Point is a simple x,y,z coordinate structure (3 doubles).
TheDistances is an array containing room for InputSize doubles.

Most of the time consumed (99%+) is in the following for loops:

for (size_t i = 0; i < OutputSize; ++i)
{
for (size_t j = 0; j < OutputSize; ++j)
{
for (size_t k = 0; k < InputSize; ++k)
{
TheDistances[k] = sqrt ( pow(TempPoint.x - (*Input)[k].x,2 ) +
pow(TempPoint.y - (*Input)[k].y, 2) );
if (TheDistances[k] != 0)
{
TheDistances[k] = (pow( TheDistances[k],2) *
(log(TheDistances[k])-1));

}
}
} // for (int j = 0; j < OutputSize; ++j)
} // for (int i = 0; i < OutputSize; ++i)

This takes roughly 13.75 seconds.


By copying the values in Input to Point* InputCopy and then
replacing the innerloop with
TheDistances[k] = sqrt ( pow(TempPoint.x - InputCopy[k].x,2 ) +
pow(TempPoint.y - InputCopy[k].y, 2) );

I get the speed up to roughly 0.325 seconds.
What am I missing here that i get this efficiency increase?
 
A

Achintya

I have a function that before I modified it took around 13.75 seconds
to complete after the modification it took .325 seconds to complete.

the function header:
(Point **Input, size_t InputSize, Point *Output, size_t OutputSize);

Point is a simple x,y,z coordinate structure (3 doubles).
TheDistances is an array containing room for InputSize doubles.

Most of the time consumed (99%+) is in the following for loops:

for (size_t i = 0; i < OutputSize; ++i)
{
for (size_t j = 0; j < OutputSize; ++j)
{
for (size_t k = 0; k < InputSize; ++k)
{
TheDistances[k] = sqrt ( pow(TempPoint.x - (*Input)[k].x,2 ) +
pow(TempPoint.y - (*Input)[k].y, 2) );
if (TheDistances[k] != 0)
{
TheDistances[k] = (pow( TheDistances[k],2) *
(log(TheDistances[k])-1));

}
}
} // for (int j = 0; j < OutputSize; ++j)
} // for (int i = 0; i < OutputSize; ++i)

This takes roughly 13.75 seconds.


By copying the values in Input to Point* InputCopy and then
replacing the innerloop with
TheDistances[k] = sqrt ( pow(TempPoint.x - InputCopy[k].x,2 ) +
pow(TempPoint.y - InputCopy[k].y, 2) );

I get the speed up to roughly 0.325 seconds.
What am I missing here that i get this efficiency increase?

Hi,

But where are you using the outer loop varaibles i and j...it seems you
are repeating (OutputSize)^2 times.

-vs_p
 
V

velthuijsen

whoops thought I copied the entire inside section. The i and j loop are
used to get a value for TempPoint.
And yes this thing loops OutputSize^2 * InputSize.
This is the modified section (altered a bit more to lower roundof
error)
for (size_t i = 0; i < OutputSize; ++i)
{
for (size_t j = 0; j < OutputSize; ++j)
{
TempPoint = Output[j+i*OutputSize];
TempPoint.z = 0;
for (size_t k = 0; k < InputSize; ++k)
{
TheDistances[k] = pow(TempPoint.x - WorkCopyOfInput[k].x,2 ) +
pow(TempPoint.y - WorkCopyOfInput[k].y, 2);
if (TheDistances[k] != 0)
{
TheDistances[k] = (pow( TheDistances[k],2) *
(log(sqrt(TheDistances[k]))-1));
}
}
for (size_t k= 0; k < InputSize; ++k)
{
TempPoint.z+=(TheDistances[k]*Weights[k]);
}
Output[j+i*OutputSize].z = TempPoint.z;
} // for (int j = 0; j < OutputSize; ++j)
} // for (int i = 0; i < OutputSize; ++i)

The original part in the innermost loop was:
TheDistances[k] = pow(TempPoint.x - (*Input)[k].x,2 ) +
pow(TempPoint.y - (*Input)[k].y, 2);
if (TheDistances[k] != 0)
{
TheDistances[k] = (pow( TheDistances[k],2) *
(log(sqrt(TheDistances[k]))-1));
 
M

msalters

I have a function that before I modified it took around 13.75 seconds
to complete after the modification it took .325 seconds to complete.

the function header:
(Point **Input, size_t InputSize, Point *Output, size_t OutputSize);
Most of the time consumed (99%+) is in the following for loops: ....
This takes roughly 13.75 seconds.

By copying the values in Input to Point* InputCopy ...
I get the speed up to roughly 0.325 seconds.
What am I missing here that i get this efficiency increase?

A logical assumption is that the compiler spotted that
InputCopy doesn't change. This means it can cache these
values more aggresively. Input points to unknown memory,
which might even overlap Output. That means the compiler
has to be rather careful, and not use cached Input values
at all.

HTH,
Michiel Salters
 
K

Karl Heinz Buchegger

msalters said:
A logical assumption is that the compiler spotted that
InputCopy doesn't change. This means it can cache these
values more aggresively. Input points to unknown memory,
which might even overlap Output. That means the compiler
has to be rather careful, and not use cached Input values
at all.

That's a sensible explanation. But honestly, I don't buy it.
Not for a speedup from 13 to 0.3 seconds in a tight O(3) loop,
which is full of calls to sqrt() and pow(). Even if the value
of *Input is fetched from memory every time, its time would be
overwhelmed by the remaining code.

To the OP: Are you sure you don't compare apples with oranges?
That is: A debug build with a release build
I notice that i and j are *not* used anywhere in the
loop body. So while the compiler will let those 2 loops
intact in debug mode, it would optimize them away in a
release build.
Therefore the debug build would do the very same
calculation OutputSize*OutputSize number of times, while
the release build does it only once.

Are you sure you didn't ruin anthing else? Did you
check if the results are still correct?
 
V

velthuijsen

Not comparing a release build with a debug build.
I first did a short version of the problem but then figured out that
that would mangle my question and I thought I replaced the entire bit
with the code in the loops (I posted the complete version later).
i and j are used.to grab a new value for TempPoint.

Still, it's puzzlign that this is happening in anycase the amount of
extra memory used by copying the input isn't that huge (compared to
some of the other memory demands of the program this function is in).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top