Performance issue: matrix multiplication in C and C++

Discussion in 'C++' started by Michael Bader, Mar 2, 2004.

  1. Hi,

    I'm currently working on a matrix multiplication code (matrix times
    matrix), and have come along some interesting/confusing results
    concerning the running time of the (apparently) same algorithm,
    when implemented in C or C++. I noticed that the implementation
    in C is faster by a factor of 2.5 compared to a identical(?)
    C++-implementation.

    The basic algorithm in C is:

    /* in function main() */
    idxC = 0;
    for(i=0; i< dimension; i++)
    for(k=0; k< dimension; k++) {
    idxA = i*dimension;
    endA = idxA + dimension;
    idxB = k;
    while ( idxA < endA ) {
    Celem[idxC] += Aelem[idxA] * Belem[idxB];
    idxA++;
    idxB += dimension;
    };
    idxC++;
    };

    where matrices A, B, and C are implemented using a 1-dimensional
    array and line-by-line numbering (i.e. Fortran-style).
    For example, idxA points to the current element A(i,j) of matrix A.
    After idxA++, the pointer will point to element A(i,j+1).
    Similar, idxA += dimension will move the pointer to element A(i+1,j).
    Thus, the code computes the matrix product C=A*B, while trying to
    avoid explicit computation of matrix indices.

    The respective algorithm in C++ is:

    template<class MatElem>
    void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
    FortranMatrix<MatElem>& C) const {
    /* ... */
    int idxC = 0;
    for(int i=0; i< dimension; i++)
    for(int k=0; k< dimension; k++) {
    int idxA = i*dimension;
    int endA = idxA + dimension;
    int idxB = k;
    while ( idxA < endA ) {
    C.elem[idxC] += elem[idxA] * B.elem[idxB];
    idxA++;
    idxB += dimension;
    };
    idxC++;
    };
    }

    Here, A, B, and C, are objects of some matrix class that have a
    member elem (of type double*).

    For the same size of matrices (729x729), the C implementation takes
    around 10 seconds on my machine (Pentium III, gcc, version 3.2,
    optimization level -O3 and -funroll-loops).
    The C++-implementation takes approximately 25 seconds on the same
    machine using g++ (same version, same optimization parameters).

    Is anybody out there able to give a comment on the reasons for
    this huge loss of performance?

    Using a static function, and the double-arrays as parameters, i.e.

    template<class MatElem>
    void FortranMatrix<MatElem>::stamult(const MatElem* A,
    const MatElem* B,
    MatElem* C, int dimension) {
    /* ... */
    int idxC = 0;
    for(int i=0; i< dimension; i++)
    for(int k=0; k< dimension; k++) {
    int idxA = i*dimension;
    int endA = idxA + dimension;
    int idxB = k;
    while ( idxA < endA ) {
    C[idxC] += A[idxA] * B[idxB];
    idxA++;
    idxB += dimension;
    };
    idxC++;
    };
    }

    Now, this is almost identical to the C algorithm, but the code still
    takes around 20 seconds. I was actually prepared to see certain
    performance losses when using C++ instead of C. However, I find it
    confusing that you lose a factor 2 in a code that's not even close to
    using objected oriented features that may slow the code down.
    I didn't try it without using templates, but that can't make too
    much of a difference, can it?

    I'd appreciate any comment on this, and I would like to thank you
    very much for taking the time to read this rather longish article :)

    Michael
     
    Michael Bader, Mar 2, 2004
    #1
    1. Advertising

  2. Michael Bader wrote:
    ....
    >
    > The respective algorithm in C++ is:
    >
    > template<class MatElem>
    > void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
    > FortranMatrix<MatElem>& C) const {
    > /* ... */
    > int idxC = 0;
    > for(int i=0; i< dimension; i++)
    > for(int k=0; k< dimension; k++) {
    > int idxA = i*dimension;
    > int endA = idxA + dimension;
    > int idxB = k;
    > while ( idxA < endA ) {
    > C.elem[idxC] += elem[idxA] * B.elem[idxB];


    C.elem[idxC]

    You can probably remove this from the loop.

    The compiler probably can't optimize the fetch of the "elem" array, so
    you'll need to do it yourself.

    Try somthing like:

    template<class MatElem>
    void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
    FortranMatrix<MatElem>& C) const {
    /* ... */
    int idxC = 0;
    for(int i=0; i< dimension; i++)
    for(int k=0; k< dimension; k++) {
    int idxA = i*dimension;
    int endA = idxA + dimension;
    int idxB = k;
    MatElem v = 0;
    while ( idxA < endA ) {
    v += elem[idxA] * B.elem[idxB];
    idxA++;
    idxB += dimension;
    };
    C.elem[idxC] = v;
    idxC++;
    };
    }

    -- still, you can also fetch the addresses this->elem, B.elem and C.elem
    and place them in temporaries (although, the compiler can probably do it
    for this->elem and B.elem since *this and B are const but not for C.).


    > idxA++;
    > idxB += dimension;
    > };
    > idxC++;
    > };
    > }
    >
    > Here, A, B, and C, are objects of some matrix class that have a
    > member elem (of type double*).
    >
    > For the same size of matrices (729x729), the C implementation takes
    > around 10 seconds on my machine (Pentium III, gcc, version 3.2,
    > optimization level -O3 and -funroll-loops).
    > The C++-implementation takes approximately 25 seconds on the same
    > machine using g++ (same version, same optimization parameters).
    >
    > Is anybody out there able to give a comment on the reasons for
    > this huge loss of performance?
    >
    > Using a static function, and the double-arrays as parameters, i.e.
    >
    > template<class MatElem>
    > void FortranMatrix<MatElem>::stamult(const MatElem* A,
    > const MatElem* B,
    > MatElem* C, int dimension) {
    > /* ... */
    > int idxC = 0;
    > for(int i=0; i< dimension; i++)
    > for(int k=0; k< dimension; k++) {
    > int idxA = i*dimension;
    > int endA = idxA + dimension;
    > int idxB = k;
    > while ( idxA < endA ) {
    > C[idxC] += A[idxA] * B[idxB];
    > idxA++;
    > idxB += dimension;
    > };
    > idxC++;
    > };
    > }
    >
    > Now, this is almost identical to the C algorithm, but the code still
    > takes around 20 seconds. I was actually prepared to see certain
    > performance losses when using C++ instead of C. However, I find it
    > confusing that you lose a factor 2 in a code that's not even close to
    > using objected oriented features that may slow the code down.
    > I didn't try it without using templates, but that can't make too
    > much of a difference, can it?


    Which version of gcc are you using ?

    >
    > I'd appreciate any comment on this, and I would like to thank you
    > very much for taking the time to read this rather longish article :)


    Try posting compilable code. The last example really does look like it
    should be at least as fast as the C code. Without examining the actual
    generated assembly code, it's hard to know why it's doing what it's doing.

    G
     
    Gianni Mariani, Mar 2, 2004
    #2
    1. Advertising

  3. Michael Bader

    Andre Kostur Guest

    Michael Bader <> wrote in
    news:c22ga6$m2a$-muenchen.de:

    > Hi,
    >
    > I'm currently working on a matrix multiplication code (matrix times
    > matrix), and have come along some interesting/confusing results
    > concerning the running time of the (apparently) same algorithm,
    > when implemented in C or C++. I noticed that the implementation
    > in C is faster by a factor of 2.5 compared to a identical(?)
    > C++-implementation.
    >
    > The basic algorithm in C is:
    >
    > /* in function main() */
    > idxC = 0;
    > for(i=0; i< dimension; i++)
    > for(k=0; k< dimension; k++) {
    > idxA = i*dimension;
    > endA = idxA + dimension;
    > idxB = k;
    > while ( idxA < endA ) {
    > Celem[idxC] += Aelem[idxA] * Belem[idxB];
    > idxA++;
    > idxB += dimension;
    > };
    > idxC++;
    > };
    >
    > where matrices A, B, and C are implemented using a 1-dimensional
    > array and line-by-line numbering (i.e. Fortran-style).
    > For example, idxA points to the current element A(i,j) of matrix A.
    > After idxA++, the pointer will point to element A(i,j+1).
    > Similar, idxA += dimension will move the pointer to element A(i+1,j).
    > Thus, the code computes the matrix product C=A*B, while trying to
    > avoid explicit computation of matrix indices.
    >
    > The respective algorithm in C++ is:
    >
    > template<class MatElem>
    > void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
    > FortranMatrix<MatElem>& C)
    > const {
    > /* ... */
    > int idxC = 0;
    > for(int i=0; i< dimension; i++)
    > for(int k=0; k< dimension; k++) {
    > int idxA = i*dimension;
    > int endA = idxA + dimension;
    > int idxB = k;
    > while ( idxA < endA ) {
    > C.elem[idxC] += elem[idxA] * B.elem[idxB];
    > idxA++;
    > idxB += dimension;
    > };
    > idxC++;
    > };
    > }
    >
    > Here, A, B, and C, are objects of some matrix class that have a
    > member elem (of type double*).


    Show, don't tell. Where is the actual class declaration of FortranMatrix
    <MatElem>? What _is_ MatElem? Show it's declaration too.
     
    Andre Kostur, Mar 2, 2004
    #3
  4. "Michael Bader" <> wrote
    > Hi,
    >
    > I'm currently working on a matrix multiplication code (matrix times
    > matrix), and have come along some interesting/confusing results
    > concerning the running time of the (apparently) same algorithm,
    > when implemented in C or C++. I noticed that the implementation
    > in C is faster by a factor of 2.5 compared to a identical(?)
    > C++-implementation.
    >
    > The basic algorithm in C is:
    >
    > /* in function main() */
    > idxC = 0;
    > for(i=0; i< dimension; i++)
    > for(k=0; k< dimension; k++) {
    > idxA = i*dimension;
    > endA = idxA + dimension;
    > idxB = k;
    > while ( idxA < endA ) {
    > Celem[idxC] += Aelem[idxA] * Belem[idxB];
    > idxA++;
    > idxB += dimension;
    > };
    > idxC++;
    > };
    >
    > where matrices A, B, and C are implemented using a 1-dimensional
    > array and line-by-line numbering (i.e. Fortran-style).
    > For example, idxA points to the current element A(i,j) of matrix A.
    > After idxA++, the pointer will point to element A(i,j+1).
    > Similar, idxA += dimension will move the pointer to element A(i+1,j).
    > Thus, the code computes the matrix product C=A*B, while trying to
    > avoid explicit computation of matrix indices.
    >
    > The respective algorithm in C++ is:
    >
    > template<class MatElem>
    > void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
    > FortranMatrix<MatElem>& C) const {
    > /* ... */
    > int idxC = 0;
    > for(int i=0; i< dimension; i++)
    > for(int k=0; k< dimension; k++) {
    > int idxA = i*dimension;
    > int endA = idxA + dimension;
    > int idxB = k;
    > while ( idxA < endA ) {
    > C.elem[idxC] += elem[idxA] * B.elem[idxB];
    > idxA++;
    > idxB += dimension;
    > };
    > idxC++;
    > };
    > }
    >
    > Here, A, B, and C, are objects of some matrix class that have a
    > member elem (of type double*).
    >
    > For the same size of matrices (729x729), the C implementation takes
    > around 10 seconds on my machine (Pentium III, gcc, version 3.2,
    > optimization level -O3 and -funroll-loops).
    > The C++-implementation takes approximately 25 seconds on the same
    > machine using g++ (same version, same optimization parameters).
    >
    > Is anybody out there able to give a comment on the reasons for
    > this huge loss of performance?
    >
    > Using a static function, and the double-arrays as parameters, i.e.
    >
    > template<class MatElem>
    > void FortranMatrix<MatElem>::stamult(const MatElem* A,
    > const MatElem* B,
    > MatElem* C, int dimension) {
    > /* ... */
    > int idxC = 0;
    > for(int i=0; i< dimension; i++)
    > for(int k=0; k< dimension; k++) {
    > int idxA = i*dimension;
    > int endA = idxA + dimension;
    > int idxB = k;
    > while ( idxA < endA ) {
    > C[idxC] += A[idxA] * B[idxB];
    > idxA++;
    > idxB += dimension;
    > };
    > idxC++;
    > };
    > }
    >
    > Now, this is almost identical to the C algorithm, but the code still
    > takes around 20 seconds. I was actually prepared to see certain
    > performance losses when using C++ instead of C. However, I find it
    > confusing that you lose a factor 2 in a code that's not even close to
    > using objected oriented features that may slow the code down.
    > I didn't try it without using templates, but that can't make too
    > much of a difference, can it?
    >
    > I'd appreciate any comment on this, and I would like to thank you
    > very much for taking the time to read this rather longish article :)


    The most likely reason is that g++ doesn't generate intermediate code that's as
    good as gcc (in fact, g++ 3.x generates bigger and slower code than g++ 2.95).
    This is fairly common. The worst difference I've seen was with Watcom's C and
    C++ compilers in the early 90's where some code ran 10 times or more slower when
    compiled with their C++ compiler than with their C compiler. This isn't a
    difference between the C and C++ languages, but between the two compiler
    implementations. C++ compilers just don't have as long a history as C compilers,
    and in the time that C++ compiler writers have had to spend adding and
    supporting the myriad features that C doesn't have, the C compiler writers have
    been able to invest in better code generation. C++ will eventually catch up
    (today's C++ compilers compare favorably with C compilers 10 years ago). In the
    meantime, you can certainly write performance-critical code in C and link it
    into your C++ application.

    Claudio Puviani
     
    Claudio Puviani, Mar 2, 2004
    #4
  5. Michael Bader

    NKOBAYE027 Guest

    "Gianni Mariani" <> wrote in message
    news:c22kkr$...
    > Michael Bader wrote:
    > ...
    > >
    > > The respective algorithm in C++ is:
    > >
    > > template<class MatElem>
    > > void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
    > > FortranMatrix<MatElem>& C) const

    {
    > > /* ... */
    > > int idxC = 0;
    > > for(int i=0; i< dimension; i++)
    > > for(int k=0; k< dimension; k++) {
    > > int idxA = i*dimension;
    > > int endA = idxA + dimension;
    > > int idxB = k;
    > > while ( idxA < endA ) {
    > > C.elem[idxC] += elem[idxA] * B.elem[idxB];

    >
    > C.elem[idxC]
    >
    > You can probably remove this from the loop.
    >
    > The compiler probably can't optimize the fetch of the "elem" array, so
    > you'll need to do it yourself.
    >
    > Try somthing like:
    >
    > template<class MatElem>
    > void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
    > FortranMatrix<MatElem>& C) const

    {
    > /* ... */
    > int idxC = 0;
    > for(int i=0; i< dimension; i++)
    > for(int k=0; k< dimension; k++) {
    > int idxA = i*dimension;
    > int endA = idxA + dimension;
    > int idxB = k;
    > MatElem v = 0;
    > while ( idxA < endA ) {
    > v += elem[idxA] * B.elem[idxB];
    > idxA++;
    > idxB += dimension;
    > };
    > C.elem[idxC] = v;
    > idxC++;
    > };
    > }
    >

    I would also remove

    > int idxA = i*dimension;
    > int endA = idxA + dimension;


    to the outer loop...no need to redeclare and initialise them each time
    through the loop over k. keep what depends upon the loop variable within the
    loop and remove everything else
     
    NKOBAYE027, Mar 2, 2004
    #5
  6. Michael Bader wrote:
    > Hi,
    >
    > I'm currently working on a matrix multiplication code (matrix times
    > matrix), and have come along some interesting/confusing results
    > concerning the running time of the (apparently) same algorithm,
    > when implemented in C or C++. I noticed that the implementation
    > in C is faster by a factor of 2.5 compared to a identical(?)
    > C++-implementation.
    >
    > The basic algorithm in C is:
    >


    somewhere up here you declare idxA, endA, idxB, etc. ONCE

    > /* in function main() */
    > idxC = 0;
    > for(i=0; i< dimension; i++)
    > for(k=0; k< dimension; k++) {
    > idxA = i*dimension;
    > endA = idxA + dimension;
    > idxB = k;
    > while ( idxA < endA ) {
    > Celem[idxC] += Aelem[idxA] * Belem[idxB];
    > idxA++;
    > idxB += dimension;
    > };
    > idxC++;
    > };
    >
    > where matrices A, B, and C are implemented using a 1-dimensional
    > array and line-by-line numbering (i.e. Fortran-style).
    > For example, idxA points to the current element A(i,j) of matrix A.
    > After idxA++, the pointer will point to element A(i,j+1).
    > Similar, idxA += dimension will move the pointer to element A(i+1,j).
    > Thus, the code computes the matrix product C=A*B, while trying to
    > avoid explicit computation of matrix indices.
    >
    > The respective algorithm in C++ is:
    >
    > template<class MatElem>
    > void FortranMatrix<MatElem>::mult(const FortranMatrix<MatElem>& B,
    > FortranMatrix<MatElem>& C) const {
    > /* ... */
    > int idxC = 0;
    > for(int i=0; i< dimension; i++)
    > for(int k=0; k< dimension; k++) {


    > int idxA = i*dimension;
    > int endA = idxA + dimension;
    > int idxB = k;

    here you declare each of these variables (and assign to them) dimension
    squared times.. I'd guess doing this amounts to moving the stack
    pointer around a lot more than you have to.

    move your declarations to the outside of the loop - keep the definitions
    where you need them though.


    > while ( idxA < endA ) {
    > C.elem[idxC] += elem[idxA] * B.elem[idxB];
    > idxA++;
    > idxB += dimension;
    > };
    > idxC++;
    > };
    > }
    >
    > Here, A, B, and C, are objects of some matrix class that have a
    > member elem (of type double*).
    >
    > For the same size of matrices (729x729), the C implementation takes
    > around 10 seconds on my machine (Pentium III, gcc, version 3.2,
    > optimization level -O3 and -funroll-loops).
    > The C++-implementation takes approximately 25 seconds on the same
    > machine using g++ (same version, same optimization parameters).
    >
    > Is anybody out there able to give a comment on the reasons for
    > this huge loss of performance?
    >
    > Using a static function, and the double-arrays as parameters, i.e.
    >
    > template<class MatElem>
    > void FortranMatrix<MatElem>::stamult(const MatElem* A,
    > const MatElem* B,
    > MatElem* C, int dimension) {
    > /* ... */
    > int idxC = 0;
    > for(int i=0; i< dimension; i++)
    > for(int k=0; k< dimension; k++) {
    > int idxA = i*dimension;
    > int endA = idxA + dimension;
    > int idxB = k;
    > while ( idxA < endA ) {
    > C[idxC] += A[idxA] * B[idxB];
    > idxA++;
    > idxB += dimension;
    > };
    > idxC++;
    > };
    > }
    >
    > Now, this is almost identical to the C algorithm, but the code still
    > takes around 20 seconds. I was actually prepared to see certain
    > performance losses when using C++ instead of C. However, I find it
    > confusing that you lose a factor 2 in a code that's not even close to
    > using objected oriented features that may slow the code down.
    > I didn't try it without using templates, but that can't make too
    > much of a difference, can it?
    >
    > I'd appreciate any comment on this, and I would like to thank you
    > very much for taking the time to read this rather longish article :)
    >
    > Michael
    >


    guessingly yours,
    Jason
     
    Jason Leasure, Mar 2, 2004
    #6
  7. Andre Kostur wrote:
    > Show, don't tell. Where is the actual class declaration of FortranMatrix
    > <MatElem>? What _is_ MatElem? Show it's declaration too.


    No problem, I just felt that the posting was already lengthy enough
    without it ...

    template<class MatElem> class FortranMatrix {

    public:

    // Constructors
    FortranMatrix(int dim);
    FortranMatrix(const FortranMatrix<MatElem>& C);

    // element access
    // --> direct access by memory index
    inline MatElem& operator[] (int index) const;
    inline MatElem& operator[] (int index);

    // --> access via line index and column index
    inline MatElem& operator() (int line, int col) const;
    inline MatElem& operator() (int line, int col);

    inline void mult(const FortranMatrix<MatElem>& B,
    FortranMatrix<MatElem>& C) const;

    /* ... addtionial methods skipped ... */

    public:

    // Members
    MatElem* elem; // matrix elements
    int dimension; // number of lines (also number of columns)

    };

    MatElem is instantiated by double.

    Regards

    Michael
     
    Michael Bader, Mar 3, 2004
    #7
  8. Jason Leasure wrote:
    >> int idxC = 0;
    >> for(int i=0; i< dimension; i++)
    >> for(int k=0; k< dimension; k++) {

    >
    >> int idxA = i*dimension;
    >> int endA = idxA + dimension;
    >> int idxB = k;

    > here you declare each of these variables (and assign to them) dimension
    > squared times.. I'd guess doing this amounts to moving the stack
    > pointer around a lot more than you have to.


    I gave that a try, but the performance gain is almost negligible.
    Probably, the optimizer can figure that out on its own.

    Thanks anyway

    Michael
     
    Michael Bader, Mar 3, 2004
    #8
  9. Claudio Puviani wrote:
    > The most likely reason is that g++ doesn't generate intermediate code
    > that's as good as gcc (in fact, g++ 3.x generates bigger and slower code
    > than g++ 2.95).


    Hi,

    thanks for the suggestion, I'll definitely try to do the same
    comparison using an older version of gcc/g++.
    I also wanted to use an Intel compiler, but I need to ask
    someone who owns a licence for that ...

    Thanks

    Michael
     
    Michael Bader, Mar 3, 2004
    #9
  10. Gianni Mariani wrote:
    > Try posting compilable code. The last example really does look like it
    > should be at least as fast as the C code. Without examining the actual
    > generated assembly code, it's hard to know why it's doing what it's doing.


    I've put the code for download on
    http://www5.in.tum.de/~bader/mmult/
    (directory, listing should be allowed).

    You need to set the flag __TEMPLATES_IN_HEADERS for compiling
    (for g++, use option -D__TEMPLATES_IN_HEADERS).

    If you can use a different compiler/architecture, the results could
    lso be interesting.

    Thanks in advance

    Michael
     
    Michael Bader, Mar 3, 2004
    #10
  11. Gianni Mariani wrote:
    >
    > C.elem[idxC]
    >
    > You can probably remove this from the loop.
    >
    > The compiler probably can't optimize the fetch of the "elem" array, so
    > you'll need to do it yourself.


    Well, basically, that was the reason why I tried the "static" version
    of the algorithm (third code example in my previous posting).
    There, A, B, and C are no longer objects, but plain double arrays.
    It actually leads to a performance gain (20sec instead of 25sec),
    but there's still a factor 2 compared to the C-programm.

    Regards

    Michael
     
    Michael Bader, Mar 3, 2004
    #11
  12. Gianni Mariani wrote:
    > /* ... */
    > int idxC = 0;
    > for(int i=0; i< dimension; i++)
    > for(int k=0; k< dimension; k++) {
    > int idxA = i*dimension;
    > int endA = idxA + dimension;
    > int idxB = k;
    > MatElem v = 0;
    > while ( idxA < endA ) {
    > v += elem[idxA] * B.elem[idxB];
    > idxA++;
    > idxB += dimension;
    > };
    > C.elem[idxC] = v;
    > idxC++;
    > };
    > }


    Sorry, I nearly missed your suggestion in that code.
    You actually got the solution here: the modification reduces
    the running to approx. 10sec in C++.

    So using
    MatElem v = 0;
    while ( idxA < endA ) {
    v += elem[idxA] * B.elem[idxB];
    idxA++;
    idxB += dimension;
    };
    C.elem[idxC] += v;

    instead of
    while ( idxA < endA ) {
    C.elem[idxC] += elem[idxA] * B.elem[idxB];
    idxA++;
    idxB += dimension;
    };

    reduces the execution time by a factor of 2 (when using g++).
    However, when using gcc (i.e. C), the same modifiction doesn't
    make any difference, so it clearly seems to be an optimization
    problem.

    Thanks for your help

    Michael
     
    Michael Bader, Mar 3, 2004
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. walala
    Replies:
    2
    Views:
    2,330
    joris
    Mar 24, 2010
  2. robix

    matrix multiplication

    robix, Nov 12, 2003, in forum: C Programming
    Replies:
    3
    Views:
    515
    Rouben Rostamian
    Nov 13, 2003
  3. Sssasss

    matrix Multiplication

    Sssasss, Oct 18, 2006, in forum: Python
    Replies:
    7
    Views:
    376
    Sssasss
    Oct 18, 2006
  4. Replies:
    14
    Views:
    862
    Mark McIntyre
    Jan 26, 2006
  5. William Hughes
    Replies:
    13
    Views:
    1,238
    Ben Bacarisse
    Mar 15, 2010
Loading...

Share This Page