optimisation of a function

A

anon

Hello

I am working on the optimization of a function, which should do
something extensive work. Running the profiler, I identified this
function to be the bottleneck.

Simplified function looks like this:

void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
const int data1 = p[SIZE*i+0];
const int data2 = p[SIZE*i+1];
const int data3 = p[SIZE*i+2];
const int data4 = p[SIZE*i+3];
// use data in the calculation and return the result to
// p[j+SIZE*i]
}
}

SIZE - some const predefined value

Now they introduced another variable number of data used in the
calculations, and I have to modify this to get the optimal function.
Number of data is not that big (2 to 8).

Now, I have several solutions, and would like to hear which in your
opinion is the best.

Solution 1:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+k];
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}

Solution 2:
Define 7 different functions, which do the calculation on 2 to 8 number
of data. After all, the template will unroll into these.

Maybe solution 3?

Looking forward to hear from you.

Cheers!
 
A

Andre Kostur

Hello

I am working on the optimization of a function, which should do
something extensive work. Running the profiler, I identified this
function to be the bottleneck.

Simplified function looks like this:

void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
const int data1 = p[SIZE*i+0];
const int data2 = p[SIZE*i+1];
const int data3 = p[SIZE*i+2];
const int data4 = p[SIZE*i+3];
// use data in the calculation and return the result to
// p[j+SIZE*i]
}
}

Um... does your profiler say that your function is taking too much time
initializing those variables?
SIZE - some const predefined value

Now they introduced another variable number of data used in the
calculations, and I have to modify this to get the optimal function.
Number of data is not that big (2 to 8).

"optimal" function is rather ambiguous. However, one thing to note, the
variables data1 through data4 aren't dependant in any way on the value
of j. So why not lift those four variables out of the j loop. Saves
you having to reinitialize them (i * (j / 4)) times.
Now, I have several solutions, and would like to hear which in your
opinion is the best.

Solution 1:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+k];
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}

Same deal.. your initialization of pData isn't dependant on j, so bring
it out of the loop. Also, isn't your j loop supposed to be "j += N"
for the increment stage?
Solution 2:
Define 7 different functions, which do the calculation on 2 to 8
number of data. After all, the template will unroll into these.

Ick. Template's probably better. Also with templates, if you don't use
it, it won't get instantiated. May result in smaller code.

Third possibility, I'm not sure how you're using the pData variables,
but perhaps references might be an idea too. Or have pData simply point
to the beginning of the range of values you're looking at (they were
originally declared as const, so I'm assuming that you're not going to
be modifying those values):

int * pData = &p[SIZE * i];

And then use pData[0] -> pData[N - 1] as appropriate.

Of course... run the profiler on the modified versions to see if you've
had any effect.
 
A

anon

Andre said:
Hello

I am working on the optimization of a function, which should do
something extensive work. Running the profiler, I identified this
function to be the bottleneck.

Simplified function looks like this:

void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
const int data1 = p[SIZE*i+0];
const int data2 = p[SIZE*i+1];
const int data3 = p[SIZE*i+2];
const int data4 = p[SIZE*i+3];
// use data in the calculation and return the result to
// p[j+SIZE*i]
}
}

Um... does your profiler say that your function is taking too much time
initializing those variables?
SIZE - some const predefined value

Now they introduced another variable number of data used in the
calculations, and I have to modify this to get the optimal function.
Number of data is not that big (2 to 8).

"optimal" function is rather ambiguous. However, one thing to note, the
variables data1 through data4 aren't dependant in any way on the value
of j. So why not lift those four variables out of the j loop. Saves
you having to reinitialize them (i * (j / 4)) times.

Sorry, thats typo :( Those depend on j:
const int data1 = p[SIZE*i+4*j+0];
const int data2 = p[SIZE*i+4*j+1];
const int data3 = p[SIZE*i+4*j+2];
const int data4 = p[SIZE*i+4*j+3];
Now, I have several solutions, and would like to hear which in your
opinion is the best.

Solution 1:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ 4 )
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+k];
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}

Same deal.. your initialization of pData isn't dependant on j, so bring
it out of the loop. Also, isn't your j loop supposed to be "j += N"
for the increment stage?

yes. That + same thing for pData:
template < int N >
void a( int n, int m, int *p )
{
for ( int i=0; i<n; ++i )
{
for ( int j=0; j<m; j =+ N ) <<<
{
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+N*j+k]; <<<
}

// use pData in the calculation and return the result to
// p[j+SIZE*i]
}
}

Ick. Template's probably better. Also with templates, if you don't use
it, it won't get instantiated. May result in smaller code.

I am mostly concerned with performances for this function. If it is
better to have 8 different function, I would go for it.
My profiler says that I loose some time in the 3rd loop:
int pData[N];
for ( int k=0; k<N; ++k)
{
pData[k] = p[SIZE*i+N*j+k];
}
to initialize these values, and to initialize and handle k.
Third possibility, I'm not sure how you're using the pData variables,
but perhaps references might be an idea too. Or have pData simply point

Those data is used in a complex math function, impossible to optimize
further.
to the beginning of the range of values you're looking at (they were
originally declared as const, so I'm assuming that you're not going to
be modifying those values):

int * pData = &p[SIZE * i];

And then use pData[0] -> pData[N - 1] as appropriate.

Of course... run the profiler on the modified versions to see if you've
had any effect.

Yes, thats the best test.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,244
Latest member
cryptotaxsoftware12

Latest Threads

Top