Matrix multiply code

  • Thread starter christopher diggins
  • Start date
C

christopher diggins

Here is some code I wrote for Matrix multiplication for arbitrary
dimensionality known at compile-time. I am curious how practical it is. For
instance, is it common to know the dimensionality of matricies at
compile-time? Any help would be appreciated. Hopefully this code comes in
useful for someone, let me know if you find it useful, or if you have
suggestions on how to improve it.

// Public Domain by Christopher Diggins, 2005

#include <iostream>
#include <numeric>
#include <valarray>
#include <cassert>

template<class Value_T, unsigned int Size_N, unsigned int Stride_N>
class Slice
{
public:
// constructor
Slice(Value_T* x) : p(x) { }
// typedef
typedef Value_T value_type;
// only a forward iterator is provided,
// others are left as an exercise for the reader
struct Slice_forward_iter {
Slice_forward_iter(Value_T* x) : p(x) { }
Value_T& operator*() { return *p; }
const Value_T& operator*() const { return *p; }
Value_T& operator++() { Value_T* tmp = p; p += Stride_N; return *tmp; }
Value_T& operator++(int) { return *(p+=Stride_N); }
bool operator==(const Slice_forward_iter& x) { return p == x.p; }
bool operator!=(const Slice_forward_iter& x) { return p != x.p; }
Value_T* p;
};
typedef Slice_forward_iter iterator;
typedef const Slice_forward_iter const_iterator;
// public functions
iterator begin() { return iterator(p); }
iterator end() { return iterator(p + (Size_N * Stride_N)); }
value_type& operator[](size_t n) { return *(p + (n * Stride_N)); }
const value_type& operator[](size_t n) const { return *(p + (n *
Stride_N)); }
static size_t size() { return Size_N; }
static size_t stride() { return Stride_N; }
private:
// prevent default construction
Slice() { };
Value_T* p;
};

template<class Value_T, unsigned int Rows_N, unsigned int Cols_N>
class Matrix
{
public:
typedef Slice<Value_T, Cols_N, 1> row_type;
typedef Slice<Value_T, Rows_N, Cols_N> col_type;
const row_type row(unsigned int n) const {
assert(n < rows);
return row_type(data + (n * Cols_N));
}
const col_type column(unsigned int n) const {
assert(n < cols);
return col_type(data + n);
}
row_type operator[](unsigned int n) { return row(n); }
const row_type operator[](unsigned int n) const { return row(n); }
const static unsigned int rows = Rows_N;
const static unsigned int cols = Cols_N;
typedef Value_T value_type;
private:
mutable Value_T data[Rows_N * Cols_N];
};

template<class Matrix1, class Matrix2>
struct MatrixMultiplicationType {
typedef Matrix<typename Matrix1::value_type, Matrix1::rows, Matrix2::cols>
type;
};

template<class Slice1_T, class Slice2_T>
typename Slice1_T::value_type dot_product(Slice1_T x, Slice2_T y) {
assert(x.size() == y.size());
return std::inner_product(x.begin(), x.end(), y.begin(), typename
Slice1_T::value_type(0));
}

template<class Matrix1, class Matrix2>
typename MatrixMultiplicationType<Matrix1, Matrix2>::type
matrix_multiply(const Matrix1& m1, const Matrix2& m2)
{
typename MatrixMultiplicationType<Matrix1, Matrix2>::type result;
assert(Matrix1::cols == Matrix2::rows);
for (int i=0; i < Matrix1::rows; ++i)
for (int j=0; j < Matrix2::cols; ++j)
result[j] = dot_product(m1.row(i), m2.column(j));
return result;
}

template<typename Matrix_T>
void output_matrix(const Matrix_T& x) {
for (int i=0; i < x.rows; ++i) {
for (int j=0; j < x.cols; ++j) {
std::cout << x[j] << ", ";
}
std::cout << std::endl;
}
std::cout << std::endl;
}

void test_matrix() {
Matrix<int, 1, 2> m1;
m1[0][0] = 2;
m1[0][1] = 3;
Matrix<int, 2, 2> m2;
m2[0][0] = 2;
m2[0][1] = 0;
m2[1][0] = 0;
m2[1][1] = 2;
output_matrix(m2);
Matrix<int, 1, 2> m3 = matrix_multiply(m1, m2);
output_matrix(m3);
}

int main() {
test_matrix();
system("pause");
return 0;
}
 
E

E. Robert Tisdale

christopher said:
Here is some code I wrote for Matrix multiplication for arbitrary
dimensionality known at compile-time. I am curious how practical it is. For
instance, is it common to know the dimensionality of matricies at
compile-time? Any help would be appreciated. Hopefully this code comes in
useful for someone, let me know if you find it useful, or if you have
suggestions on how to improve it.

// Public Domain by Christopher Diggins, 2005

Unless you claim the copyright,
You have no right to give your code away
or even to use it yourself.
For more details, see:

http://www.gnu.org/licenses/licenses.html

You will probably want to use a copyleft:

http://www.gnu.org/licenses/licenses.html#WhatIsCopyleft
#include <iostream>
#include <numeric>
#include <valarray>
#include <cassert>

template<class Value_T, unsigned int Size_N, unsigned int Stride_N>
class Slice
{
public:
// constructor
Slice(Value_T* x) : p(x) { }
// typedef
typedef Value_T value_type;
// only a forward iterator is provided,
// others are left as an exercise for the reader
struct Slice_forward_iter {
Slice_forward_iter(Value_T* x) : p(x) { }
Value_T& operator*() { return *p; }
const Value_T& operator*() const { return *p; }
Value_T& operator++() { Value_T* tmp = p; p += Stride_N; return *tmp; }
Value_T& operator++(int) { return *(p+=Stride_N); }
bool operator==(const Slice_forward_iter& x) { return p == x.p; }
bool operator!=(const Slice_forward_iter& x) { return p != x.p; }
Value_T* p;
};
typedef Slice_forward_iter iterator;
typedef const Slice_forward_iter const_iterator;
// public functions
iterator begin() { return iterator(p); }
iterator end() { return iterator(p + (Size_N * Stride_N)); }
value_type& operator[](size_t n) { return *(p + (n * Stride_N)); }
const value_type& operator[](size_t n) const { return *(p + (n *
Stride_N)); }
static size_t size() { return Size_N; }
static size_t stride() { return Stride_N; }
private:
// prevent default construction
Slice() { };
Value_T* p;
};

template<class Value_T, unsigned int Rows_N, unsigned int Cols_N>
class Matrix
{
public:
typedef Slice<Value_T, Cols_N, 1> row_type;
typedef Slice<Value_T, Rows_N, Cols_N> col_type;
const row_type row(unsigned int n) const {
assert(n < rows);
return row_type(data + (n * Cols_N));
}
const col_type column(unsigned int n) const {
assert(n < cols);
return col_type(data + n);
}
row_type operator[](unsigned int n) { return row(n); }
const row_type operator[](unsigned int n) const { return row(n); }
const static unsigned int rows = Rows_N;
const static unsigned int cols = Cols_N;
typedef Value_T value_type;
private:
mutable Value_T data[Rows_N * Cols_N];
};

template<class Matrix1, class Matrix2>
struct MatrixMultiplicationType {
typedef Matrix<typename Matrix1::value_type, Matrix1::rows, Matrix2::cols>
type;
};

template<class Slice1_T, class Slice2_T>
typename Slice1_T::value_type dot_product(Slice1_T x, Slice2_T y) {
assert(x.size() == y.size());
return std::inner_product(x.begin(), x.end(), y.begin(), typename
Slice1_T::value_type(0));
}

template<class Matrix1, class Matrix2>
typename MatrixMultiplicationType<Matrix1, Matrix2>::type
matrix_multiply(const Matrix1& m1, const Matrix2& m2)
{
typename MatrixMultiplicationType<Matrix1, Matrix2>::type result;
assert(Matrix1::cols == Matrix2::rows);
for (int i=0; i < Matrix1::rows; ++i)
for (int j=0; j < Matrix2::cols; ++j)
result[j] = dot_product(m1.row(i), m2.column(j));
return result;
}

template<typename Matrix_T>
void output_matrix(const Matrix_T& x) {
for (int i=0; i < x.rows; ++i) {
for (int j=0; j < x.cols; ++j) {
std::cout << x[j] << ", ";
}
std::cout << std::endl;
}
std::cout << std::endl;
}

void test_matrix() {
Matrix<int, 1, 2> m1;
m1[0][0] = 2;
m1[0][1] = 3;
Matrix<int, 2, 2> m2;
m2[0][0] = 2;
m2[0][1] = 0;
m2[1][0] = 0;
m2[1][1] = 2;
output_matrix(m2);
Matrix<int, 1, 2> m3 = matrix_multiply(m1, m2);
output_matrix(m3);
}

int main() {
test_matrix();
system("pause");
return 0;
}

g++ -Wall -ansi -pedantic -o main main.cpp
main.cpp: In function `void output_matrix(const Matrix_T&) [with
Matrix_T = Matrix<int, 2u, 2u>]':
main.cpp:112: instantiated from here
main.cpp:94: warning: comparison between signed and unsigned integer
expressionsmain.cpp:112: instantiated from here
main.cpp:95: warning: comparison between signed and unsigned integer
expressionsmain.cpp: In function `typename
MatrixMultiplicationType<Matrix1, Matrix2>::type matrix_multiply(const
Matrix1&, const Matrix2&) [with Matrix1 = Matrix<int, 1u, 2u>, Matrix2 =
Matrix<int, 2u, 2u>]':
main.cpp:113: instantiated from here
main.cpp:86: warning: comparison between signed and unsigned integer
expressionsmain.cpp:113: instantiated from here
main.cpp:87: warning: comparison between signed and unsigned integer
expressionsmain.cpp: In function `void output_matrix(const Matrix_T&)
[with Matrix_T = Matrix<int, 1u, 2u>]':
main.cpp:114: instantiated from here
main.cpp:94: warning: comparison between signed and unsigned integer
expressionsmain.cpp:114: instantiated from here
main.cpp:95: warning: comparison between signed and unsigned integer
expressions

Evidently, your code needs lots of work.
Also, the design is still very poor.
See The Object-Oriented Numerics Page

http://www.oonumerics.org/oon/

to get an idea about how the experts do this.

Good Luck.
 
D

Donovan Rebbechi

Here is some code I wrote for Matrix multiplication for arbitrary
dimensionality known at compile-time. I am curious how practical it is. For
instance, is it common to know the dimensionality of matricies at
compile-time?

There are some matrices where this is known. It depends on what the matrix
models. If the matrix models some transform, e.g. a 3d coordinate transform,
or a linear transform between two known lists of variables, then the
dimensions may well be known at compile time.

But if the matrix is a data set of some sort (example: some sort of collection
of samples or measurements), the dimensions probably will not be known at
compile time.

Cheers,
 
C

christopher diggins

E. Robert Tisdale said:
Unless you claim the copyright,

In many countries (including Canada and the US), copyright doesn't have to
be claimed to be attributed.
see http://en.wikipedia.org/wiki/Copyright#Copyright_notices and
http://www.templetons.com/brad/copymyths.html
You have no right to give your code away
or even to use it yourself.
For more details, see:

http://www.gnu.org/licenses/licenses.html

You will probably want to use a copyleft:

http://www.gnu.org/licenses/licenses.html#WhatIsCopyleft

No thank you, I would prefer contributing something to the public domain
rather than restricting usage.
g++ -Wall -ansi -pedantic -o main main.cpp
main.cpp: In function `void output_matrix(const Matrix_T&) [with Matrix_T
= Matrix<int, 2u, 2u>]':
main.cpp:112: instantiated from here
main.cpp:94: warning: comparison between signed and unsigned integer
expressionsmain.cpp:112: instantiated from here
main.cpp:95: warning: comparison between signed and unsigned integer
expressionsmain.cpp: In function `typename
MatrixMultiplicationType<Matrix1, Matrix2>::type matrix_multiply(const
Matrix1&, const Matrix2&) [with Matrix1 = Matrix<int, 1u, 2u>, Matrix2 =
Matrix<int, 2u, 2u>]':
main.cpp:113: instantiated from here
main.cpp:86: warning: comparison between signed and unsigned integer
expressionsmain.cpp:113: instantiated from here
main.cpp:87: warning: comparison between signed and unsigned integer
expressionsmain.cpp: In function `void output_matrix(const Matrix_T&)
[with Matrix_T = Matrix<int, 1u, 2u>]':
main.cpp:114: instantiated from here
main.cpp:94: warning: comparison between signed and unsigned integer
expressionsmain.cpp:114: instantiated from here
main.cpp:95: warning: comparison between signed and unsigned integer
expressions

Evidently, your code needs lots of work.

Simply because I overlooked a few comparison between signed and unsigned
integers warnings? I don't doubt my code has errors and needs work, but the
pedantic warnings from GCC do not indicate where work is actually needed. I
do know that you do have much more to offer me than that ;-)
Also, the design is still very poor.

Could you be more specific? I do appreciate criticism, but I need something
more tangible.
See The Object-Oriented Numerics Page

http://www.oonumerics.org/oon/

to get an idea about how the experts do this.

Thank you, I have looked closely at several libraries, and my original
questions still stand.
Good Luck.

Thank you.

- Christopher Diggins
 
E

E. Robert Tisdale

christopher said:
E. Robert Tisdale said:
christopher diggins wrote:



Unless you claim the copyright,


In many countries (including Canada and the US), copyright doesn't have to
be claimed to be attributed.
see http://en.wikipedia.org/wiki/Copyright#Copyright_notices and
http://www.templetons.com/brad/copymyths.html

You have no right to give your code away
or even to use it yourself.
For more details, see:

http://www.gnu.org/licenses/licenses.html

You will probably want to use a copyleft:

http://www.gnu.org/licenses/licenses.html#WhatIsCopyleft


No thank you, I would prefer contributing something to the public domain
rather than restricting usage.

g++ -Wall -ansi -pedantic -o main main.cpp

main.cpp: In function `void output_matrix(const Matrix_T&) [with Matrix_T
= Matrix<int, 2u, 2u>]':
main.cpp:112: instantiated from here
main.cpp:94: warning: comparison between signed and unsigned integer
expressionsmain.cpp:112: instantiated from here
main.cpp:95: warning: comparison between signed and unsigned integer
expressionsmain.cpp: In function `typename
MatrixMultiplicationType<Matrix1, Matrix2>::type matrix_multiply(const
Matrix1&, const Matrix2&) [with Matrix1 = Matrix<int, 1u, 2u>, Matrix2 =
Matrix<int, 2u, 2u>]':
main.cpp:113: instantiated from here
main.cpp:86: warning: comparison between signed and unsigned integer
expressionsmain.cpp:113: instantiated from here
main.cpp:87: warning: comparison between signed and unsigned integer
expressionsmain.cpp: In function `void output_matrix(const Matrix_T&)
[with Matrix_T = Matrix<int, 1u, 2u>]':
main.cpp:114: instantiated from here
main.cpp:94: warning: comparison between signed and unsigned integer
expressionsmain.cpp:114: instantiated from here
main.cpp:95: warning: comparison between signed and unsigned integer
expressions

Evidently, your code needs lots of work.


Simply because I overlooked a few comparison between signed and unsigned
integers warnings? I don't doubt my code has errors and needs work, but the
pedantic warnings from GCC do not indicate where work is actually needed. I
do know that you do have much more to offer me than that ;-)

Also, the design is still very poor.


Could you be more specific? I do appreciate criticism, but I need something
more tangible.

See The Object-Oriented Numerics Page

http://www.oonumerics.org/oon/

to get an idea about how the experts do this.


Thank you, I have looked closely at several libraries, and my original
questions still stand.

Good Luck.


Thank you.

- Christopher Diggins

Take a look at the blitz++ Tiny Vector Matrix library

http://www.oonumerics.org/blitz/

You can also look at the Array classes in
The C++ Scalar, Vector, Matrix and Tensor class Library

http://www.netwood.net/~edwin/svmtl/

I also have an old recursive template definitions
for Matrix classes that you can get from

ftp.cs.ucla.edu/pub/Tensor.tar.Z

All of these things exist
mainly to help answer the questions that you have asked.

Strictly speaking, your questions are off-topic in this forum
but probably dead on in the object oriented numerics mailing list

http://www.oonumerics.org/mailman/listinfo.cgi/oon-list/

Try there. I don't think that you'll be disappointed.
 
C

christopher diggins

Evidently, your code needs lots of work.
Also, the design is still very poor.
See The Object-Oriented Numerics Page

http://www.oonumerics.org/oon/

to get an idea about how the experts do this.

I was curious how an "expert" implementation compares, so I compared my code
to boost::ublas. Preliminary tests show that my implementation runs over
twice as fast, and I haven't done any profiling or optimizing whatsoever.
This makes sense, because the compiler can optimize the indexing using
constants rather than doing variable lookups. This still leaves my original
question: How practical is it? And do people often no matrix dimensions at
compile time?

For those interested here is my first attempt at benchmarking.:

#include <boost/progress.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/io.hpp>

// ... see previous post ...

using namespace boost::numeric::ublas;

template<class Number, unsigned int M, unsigned int R, unsigned int N,
unsigned int I>
void compare_naive_and_ublas()
{
Matrix<Number, M, N> result1;
matrix<Number> result2;
{
std::cout << "about to run Naive test " << std::endl;
Matrix<Number, M, R> m1;
Matrix<Number, R, N> m2;
int n = 0;
for (unsigned i = 0; i < M; ++i)
for (unsigned j = 0; j < R; ++j)
m1[j] = ++n;
n = 0;
for (unsigned i = 0; i < R; ++i)
for (unsigned j = 0; j < N; ++j)
m2[j] = ++n;
boost::progress_timer t;
for (int i=0; i < I; i++) {
result1 = matrix_multiply(m1, m2);
}
std::cout << "naive time elapsed " << t.elapsed() << std::endl;
}
{
std::cout << "about to run uBlas test " << std::endl;
matrix<Number> m1(M, R);
matrix<Number> m2(R, N);
int n = 0;
for (unsigned i = 0; i < M; ++i)
for (unsigned j = 0; j < R; ++j)
m1(i, j) = ++n;
n = 0;
for (unsigned i = 0; i < R; ++i)
for (unsigned j = 0; j < N; ++j)
m2(i, j) = ++n;
boost::progress_timer t;
for (int i=0; i < I; i++) {
result2 = prod(m1, m2);
}
std::cout << "boost time elapsed " << t.elapsed() << std::endl;
}
for (unsigned i = 0; i < M; ++i)
for (unsigned j = 0; j < N; ++j)
assert(result1[j] == result2(i, j));
}

int main() {
compare_naive_and_ublas<int,17,23,31,1000>();
system("pause");
return 0;
}
 
C

christopher diggins

christopher diggins said:
Here is some code I wrote for Matrix multiplication for arbitrary
dimensionality known at compile-time. I am curious how practical it is.
For instance, is it common to know the dimensionality of matricies at
compile-time? Any help would be appreciated. Hopefully this code comes in
useful for someone, let me know if you find it useful, or if you have
suggestions on how to improve it.

I should be saying rows and columns, not dimensionality.

Christopher Diggins
 
C

christopher diggins

Take a look at the blitz++ Tiny Vector Matrix library

This only works for very small matricies, my method works for very
arbitrarily large matricies.
You can also look at the Array classes in
The C++ Scalar, Vector, Matrix and Tensor class Library

http://www.netwood.net/~edwin/svmtl/

I don't see how this code tracks the rows and columns at compile-time.
I also have an old recursive template definitions
for Matrix classes that you can get from

ftp.cs.ucla.edu/pub/Tensor.tar.Z

I can't open it.
All of these things exist
mainly to help answer the questions that you have asked.

Strictly speaking, your questions are off-topic in this forum
but probably dead on in the object oriented numerics mailing list

Okay let me rephrase it. I have presented a way to do very fast matrix
multiplication in C++ by representing the rows and columns as template
parameters. I use some simple template metaprogramming to compute the type
of the result of the matrix multiplication. The core of the technique is:

template&lt;class Matrix1, class Matrix2>
struct MatrixMultiplicationType {
typedef Matrix<typename Matrix1::value_type, Matrix1::rows, Matrix2::cols>
type;
};

template<class Slice1_T, class Slice2_T>
typename Slice1_T::value_type dot_product(Slice1_T x, Slice2_T y) {
assert(x.size() == y.size());
return std::inner_product(x.begin(), x.end(), y.begin(), typename
Slice1_T::value_type(0));
}

template<class Matrix1, class Matrix2>
typename MatrixMultiplicationType<Matrix1, Matrix2>::type
matrix_multiply(const Matrix1& m1, const Matrix2& m2)
{
typename MatrixMultiplicationType<Matrix1, Matrix2>::type result;
assert(Matrix1::cols == Matrix2::rows);
for (int i=0; i < Matrix1::rows; ++i)
for (int j=0; j < Matrix2::cols; ++j)
result[j] = dot_product(m1.row(i), m2.column(j));
return result;
}

My question is: am I duplicating prior work and does anyone find this
interesting?
http://www.oonumerics.org/mailman/listinfo.cgi/oon-list/

Try there. I don't think that you'll be disappointed.

Thanks.
 
M

Mark P

christopher said:
Here is some code I wrote for Matrix multiplication for arbitrary
dimensionality known at compile-time. I am curious how practical it is. For
instance, is it common to know the dimensionality of matricies at
compile-time? Any help would be appreciated. Hopefully this code comes in
useful for someone, let me know if you find it useful, or if you have
suggestions on how to improve it.

You might want to look into Strassen's algorithm for matrix
multiplication which is asymptotically faster than the conventional
(row.column) approach.
 
U

Ufit

christopher diggins said:
Here is some code I wrote for Matrix multiplication for arbitrary
dimensionality known at compile-time. I am curious how practical it is. For
---------

BTW if I may interfere - now mostly they do it in assembly language to
boost the performance. I'd like to see some bench comparisons for
normal C++ algorithms because I'd say there's not much difference.

UF
 
C

christopher diggins

Of course! :)
- now mostly they do it in assembly language to
boost the performance. I'd like to see some bench comparisons for
normal C++ algorithms because I'd say there's not much difference.

UF

Here are the results for 1000 multiplications of a 17x23 matrix by a 23x31
matrix of integers by my code compared to ublas. Compiled on Visual C++ 7.1,

about to run Naive test
1.33 s

about to run uBlas test
3.38 s

(YMMV)

There are probably optimizations left to do on the code. However, the code
is mostly pointer arithmetic, so there might not be that much improvement to
be gained by doing it in assembly, though I don't know yet. Any optimization
suggestions would be welcome.
 
C

christopher diggins

Mark P said:
You might want to look into Strassen's algorithm for matrix multiplication
which is asymptotically faster than the conventional (row.column)
approach.

That is great stuff, thanks for pointing that out. Even though it is only
applicable to n x n matricies, it can still be leveraged by using partial
specializations. What could be done is something like the following (not
tested):

template<class Matrix1, class Matrix2>
struct MatrixMultiplicationType {
typedef Matrix<typename Matrix1::value_type, Matrix1::rows, Matrix2::cols>
type;
};

template<class Matrix1, class Matrix2>
typename MatrixMultiplicationType<Matrix1, Matrix2>::type
matrix_multiply(const Matrix1& m1, const Matrix2& m2)
{
return matrix_multiply_algorithm_chooser
<
Matrix1,
Matrix2,
(Matrix1::rows == Matrix1::cols)
&& (Matrix1::cols == Matrix2::rows)
&& (Matrix2::rows == Matrix2::cols)(m1, m2);
}

template<class Matrix1, class Matrix2, bool UseStrassen_B = false>
typename MatrixMultiplicationType<Matrix1, Matrix2>::type
matrix_multiply_algorithm_chooser(const Matrix1& m1, const Matrix2& m2)
{
// normal multiplication algorithm
}

template<class Matrix1, class Matrix2>
typename MatrixMultiplicationType<Matrix1, Matrix2, true>::type
matrix_multiply_algorithm_chooser(const Matrix1& m1, const Matrix2& m2)
{
// strassen multiplication algorithm
}

This brings to mind other possibilities for other special cases of
matricies.
 
J

Jeff Flinn

....
My question is: am I duplicating prior work and does anyone find this
interesting?

IIRC, Dave Abrahams is currently working on Vector/Matrix Linear Algebra
topics over at boost.

Jeff Flinn
 
M

Mark P

christopher said:
Upon further research it is important to point out to the interested reader
that Strassen's is only useful for multiplication of square matricies for n

It's not quite so bad.

That result of 654 is obtained by assuming that Strassen is applied
recursively all the way down to matrices of size 1. This is actually a
pretty horrible idea :) It's much better to apply Strassen down to a
certain minimum size n0, and then use conventional multiplication for
these small matrices.

From my own calculations, the theoretically optimal choice is to apply
Strassen down to a size of about 16. In practice, due to non-algorithmic
overhead (memory access patterns, extra copying, etc.), this number may
be a few times larger. I implemented this once in Java and found n0
around 45, but my implementation may not have been ideal (perhaps a
certainty, given that I used Java :).

The other point is that the algorithm can be extended to deal with
non-square matrices without great difficulty. Assuming M < N, an MxN
matrix can be viewed as floor(N/M) adjacent MxM square matrices and an
adjacent (N%M)xM "non-square" matrix. You can use Strassen to multiply
all combinations of the square matrices and then deal with the
combinations involving non-square matrices separately (either recurse or
use conventional multiply).

Finally, you may want to consider asking about this on a group with
algorithms in the name. Surely you'll find people with better practical
knowledge of this than I can offer.

-Mark
 
L

Lionel B

As a scientific programmer, my answer to the "usefulness" of efficient matrix multiply (and other linear algebra
routines) for compile-time-known dimensionality would probably be something like: very useful for small matrices (eg.
transforms in 3, 4, ... dimensions), less so for large matrices, since in my experience large matrix dimensions tend to
relate to problem-size parameters and/or data set sizes. Either of the latter one wants to be able to experiment with
and a recompile per experiment would certainly be undesirable.
Of course! :)


/.../

There are probably optimizations left to do on the code. However, the code
is mostly pointer arithmetic, so there might not be that much improvement to
be gained by doing it in assembly, though I don't know yet. Any optimization
suggestions would be welcome.

For small matrices, I suspect something like the Blitz++ approach (basically unrolling all loops) is probably about as
good as it gets. For large matrix multiply everything ultimately boils down, as you say, to pointer chasing. In either
case, further optimisation can probably only be achieved by exploitation of CPU features - pipelining, minimising cache
misses, intelligent register allocation, that sort of thing. The established benchmark for (large) matrix multiply is
probably the level 3 BLAS gemm() routines. The most efficient BLAS implementations tend to be the vendor-supplied ones
(frequently written in assembler and/or Fortran) which achieve performance through very architecture-specific
techniques. I've also found the ATLAS (Automatically Tuned Linear Algebra Software) implementation
(http://math-atlas.sourceforge.net/) of the BLAS to produce very fast matrix multiply - many times faster than I've ever
managed to code it myself in C or C++. If you're interested in further optimisation techniques, the ATLAS white paper
http://www.netlib.org/lapack/lawns/lawn147.ps may be a good place to start.

Regards,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top