J
Jeff
Hello
Sorry if this isn't entirely a C language question - perhaps someone could
suggest a more appropriate group?
I'm running the appended code on a Mips R12000 processor and am getting very
confused about why the use of the temporary array (temp_C) can give such a
large speed-up (50% for size<128).
The alternative to using tempC is to write the result of the inner dot
product directly into C at the end of each inner loop. As far as I can see
this is not a caching issue. My hunch is that this is related to the fact
that the size of tempC is well within a single base+offset load whereas the
size of C itself is much larger than this.
Is anyone familiar with this issue?
Many Thanks
Jeff
for(i=0; i<size; i++)
{
for(j=0; j<size; j++)
{
rowBPosition = size*j;
x=0;
for(k=0; k<size; k++)
{
x+=_pA[k] * pB[rowBPosition+k];
}
tempC[j]=x;
}
// write tempC into a row of C
while(_tempC<tempCEnd)
*_pC++=*_tempC++;
_pA+=size;
_tempC=tempC;
}
Sorry if this isn't entirely a C language question - perhaps someone could
suggest a more appropriate group?
I'm running the appended code on a Mips R12000 processor and am getting very
confused about why the use of the temporary array (temp_C) can give such a
large speed-up (50% for size<128).
The alternative to using tempC is to write the result of the inner dot
product directly into C at the end of each inner loop. As far as I can see
this is not a caching issue. My hunch is that this is related to the fact
that the size of tempC is well within a single base+offset load whereas the
size of C itself is much larger than this.
Is anyone familiar with this issue?
Many Thanks
Jeff
for(i=0; i<size; i++)
{
for(j=0; j<size; j++)
{
rowBPosition = size*j;
x=0;
for(k=0; k<size; k++)
{
x+=_pA[k] * pB[rowBPosition+k];
}
tempC[j]=x;
}
// write tempC into a row of C
while(_tempC<tempCEnd)
*_pC++=*_tempC++;
_pA+=size;
_tempC=tempC;
}