how to speed up the matrix multiplication in C

V

VijaKhara

Hi all,

Is there any method which can implememt the matrix multiplication
faster than using the formula as we often do by hand?
I am writing the following code and my matrice: one is 3x40000 and the
other one 40000x3.
the program runs too slow:

/*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/

for (i=0;i<3;i++)
{
for (j=0;j<3;j++)
{
w[j]=0;
for (k=0;k<40000;k++)
{
w[j]+=xyz_trans[k]*xyz[k][j];
}

}
}

What is your code when you multiply 2 big matrice? Is there any faster
algorithm? I see MATLAB handles this task so quick.

Thanks
 
C

Chris Dollin

Is there any method which can implememt the matrix multiplication
faster than using the formula as we often do by hand?

Yes, but AFAIK its rather hairy and only useful for large (no,
larger than that) matricies. IMBW.
I am writing the following code and my matrice: one is 3x40000 and the
other one 40000x3.
the program runs too slow:

/*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/

for (i=0;i<3;i++)
{
for (j=0;j<3;j++)
{
w[j]=0;
for (k=0;k<40000;k++)
{
w[j]+=xyz_trans[k]*xyz[k][j];
}

}
}


How slow is "too slow"? It's pretty much instant on the-here machine.
(The inner loop is only executed 360000 times. I'm assuming that
the arrays are float/double.) But if you /really/ want more speed:

Um. Well. You're not exactly /helping/ the compiler. For example, the
expression `w[j]` is evaluated /every time round the inner loop/.
So is the expression `xyz_trans`. You can have a temp for w[j],
and you can declare a local for xyz_trans.
 
M

Michael

I am writing the following code and my matrice: one is 3x40000 and the
other one 40000x3.
the program runs too slow:

/*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/

for (i=0;i<3;i++)
{
for (j=0;j<3;j++)
{
w[j]=0;
for (k=0;k<40000;k++)
{
w[j]+=xyz_trans[k]*xyz[k][j];
}

}
}

What is your code when you multiply 2 big matrice? Is there any faster
algorithm? I see MATLAB handles this task so quick.


I'm going to risk blowing smoke (i.e., I haven't tested what I'm
saying so take it with a huge grain of salt). With that in mind, here
are some ideas, which you can try and see if they help.

1) Split into two parts (as the other poster mentioned), one for
initialization and one for multiplication.
2) In the second part, switch the order of k and j. This might
potentially increase your cache hit rate, which can have a significant
speed increase. (Because in this case you'd always be accessing
consecutive memory locations, i.e., only changing the last index.)

The resulting code would be:

for (i=0;i<3;i++) {
for (j=0;j<3;j++) {
w[j]=0;
}
}
for (i=0;i<3;i++) {
for (k=0;k<40000;k++) {
for (j=0;j<3;j++) {
w[j] += xyz_trans[k] * xyz[k][j];
}
}
}

3) In the off chance you haven't already, set your compiler's flags to
maximum optimization.
4) Maybe introduce temporary variables for some things (hopefully
compiler optimization would catch those). I probably wouldn't do
this, but the code would look something like this (second part only):
for (i=0;i<3;i++) {
double* wp = w;
for (k=0;k<40000;k++) {
double* xyzp = xyz[k];
double xyz_trans_val = xyz_trans[k];
for (j=0;j<3;j++) {
wp[j] += xyz_trans_val * xyzp[j];
}
}
}
As I said, I wouldn't do it because I expect my compiler to do this
kind of thing for me. If I were really desparate, I might look at the
generated assembly to figure out if my compiler is actually doing
that.

Finally, as mentioned, I haven't tested this, so it might actually
make the code slower.

And it looks like xyz_trans might be the same as xyz transposed, in
which case you could rewrite the whole thing with only xyz and without
creating a separate array.

Michael

P.S. Matrix multiplication as written is N^3 if you multiply NxN and
NxN matrices. There's a better algorithm that's N^2.5 or something
like that, but since one of your dimensions is 3, it's not likely to
help much (and it is a significant pain to implement).
 
U

user923005

Hi all,

Is there any method which can implememt the matrix multiplication
faster than using the formula as we often do by hand?
I am writing the following code and my matrice: one is 3x40000 and the
other one 40000x3.
the program runs too slow:

/*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/

for (i=0;i<3;i++)
{
for (j=0;j<3;j++)
{
w[j]=0;
for (k=0;k<40000;k++)
{
w[j]+=xyz_trans[k]*xyz[k][j];
}

}
}

What is your code when you multiply 2 big matrice? Is there any faster
algorithm? I see MATLAB handles this task so quick.

Thanks


I posted a C Strassen matrix multiply in this thread:
http://groups.google.com/group/comp...b87e8427d4b?lnk=st&q=&rnum=2#fcbe9b87e8427d4b

There is a link to a C++ version in this thread:
http://groups.google.com/group/comp...01120cc246b?lnk=st&q=&rnum=1#bd54b01120cc246b
 
C

christian.bau

Hi all,

Is there any method which can implememt the matrix multiplication
faster than using the formula as we often do by hand?
I am writing the following code and my matrice: one is 3x40000 and the
other one 40000x3.
the program runs too slow:

/*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/

for (i=0;i<3;i++)
{
for (j=0;j<3;j++)
{
w[j]=0;
for (k=0;k<40000;k++)
{
w[j]+=xyz_trans[k]*xyz[k][j];
}

}
}

What is your code when you multiply 2 big matrice? Is there any faster
algorithm? I see MATLAB handles this task so quick.

Thanks


Your post is missing some important things: How exactly are i, j, k,
xyz, xyz_trans and w declared? How fast or slow exactly is the code,
on which processor? That information is needed to decide if there is
something significantly wrong with your code, or whether this is just
a bit slow because your code is quite clumsy, or whether this code is
actually as good as you could expect. I suggest you replace your code
with (I left out some declarations because you left out stuff that I
would need to know)
for (i=0;i<3;i++)
{ s0 = s1 = s2 = 0;
for (k=0;k<40000;k++)
{
t = xyz_trans[k];
p = &xyz[k][0];
s0 += t * p [0]; s1 _= t * p [1]; s2 += t * p [2];
w [0] = s0; w [1] = s1; w [2] = s2;

Measure the times and post it here, plus post the _complete_ source,
then we can have another look. About a millisecond on a modern desktop
computer seems reasonable.
 
R

Richard Harter

Hi all,

Is there any method which can implememt the matrix multiplication
faster than using the formula as we often do by hand?
I am writing the following code and my matrice: one is 3x40000 and the
other one 40000x3.
the program runs too slow:

/*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/

for (i=0;i<3;i++)
{
for (j=0;j<3;j++)
{
w[j]=0;
for (k=0;k<40000;k++)
{
w[j]+=xyz_trans[k]*xyz[k][j];
}

}
}

What is your code when you multiply 2 big matrice? Is there any faster
algorithm? I see MATLAB handles this task so quick.


Your problem is that you have lots of caches faults and perhaps even
page fault because the stride in one of the terms in the dot product
isn't equal to one. This being C code, the stride in xyz[k][j] is 3
rather than 1. This is a well known issue in matrix multiplication. If
feasible the solution is to transpose the matrix causing trouble first.

BTW this really isn't a C problem. Comp.programming or
sci.math.num-analysis might be more appropriate groups.
 
D

Dik T. Winter

> Is there any method which can implememt the matrix multiplication
> faster than using the formula as we often do by hand?
> I am writing the following code and my matrice: one is 3x40000 and the
> other one 40000x3.
> the program runs too slow:
>
> /*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/
> for (i=0;i<3;i++)
> {
> for (j=0;j<3;j++)
> {
> w[j]=0;
> for (k=0;k<40000;k++)
> {
> w[j]+=xyz_trans[k]*xyz[k][j];
> }
>
> }
> }


It depends on the processor, and I think also on caching. But you might
try the following:
for(i = 0; i < 3; i++) for(j = 0; j < 3; j++) w[j] = 0;
for(k = 0; k < 40000; k++) {
for(i = 0; i < 3; i++) {
xyzt = xyz_trans[k];
for(j = 0; j < 3; j++) {
w[j] += xyzt * xyz[k][j];
}
}
}

Under some circumstances it works better.
 
D

Dik T. Winter

> P.S. Matrix multiplication as written is N^3 if you multiply NxN and
> NxN matrices. There's a better algorithm that's N^2.5 or something
> like that, but since one of your dimensions is 3, it's not likely to
> help much (and it is a significant pain to implement).

It will not help at all. In order to benefit from it the matrices need
a reasonable size in all dimensions, otherwise it will be slower.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top