# An error of matrix inversion using NumPy

Discussion in 'Python' started by lancered, Apr 4, 2007.

1. ### lanceredGuest

Hi dear all,
I am using Python2.4.2+NumPy1.0.1 to deal with a parameter
estimation problem with the least square methods. During the
calculations, I use NumPy package to deal with matrix operations,
mostly matrix inversion and trasposition. The dimentions of the
matrices used are about 29x19,19x19 and 29x29.

During the calculation, I noticed an apparent error of
inverion of a 19x19 matrix. Denote this matrix as KK, U=KK^ -1, I
found the product of U and KK is not equivalent to unit matrix! This
apparently violate the definition of inversion. The inversion is
through the function linalg.inv().

I have checked that det(KK)=-1.2E+40. At first, I thought the
error may be caused by such a large determinant, so I scaled it as
LL=KK/100, then invert LL. Since det(LL)=11.5 and all its elements are
within -180.0 to 850.0, this seems easier. But the result is still not
correct, the product of LL^-1 thus obtained and LL still not unit
matrix ... At the same time, the inversion results of some 29x19
matrices are correct.

So, can you tell me what goes wrong? Is this a bug in
Numpy.linalg? How to deal with this situation? If you need, I can
post the matrix I used below, but it is so long,so not at the moment.

lancered, Apr 4, 2007

2. ### Robin BeckerGuest

lancered wrote:
> Hi dear all,

...........
> matrices are correct.
>
> So, can you tell me what goes wrong? Is this a bug in
> Numpy.linalg? How to deal with this situation? If you need, I can
> post the matrix I used below, but it is so long,so not at the moment.

........

presumably the matrix KK is actually some kind of normal matrix obtained from
the data. So you have say n variables and m observations the data matrix is than
an n x m real valued thing say D then you want the inverse of something like D'D
ie an n by n thing. Typically the data D is de-meaned and normalized by the
column norms so that you end up with a fairly well scaled problem.

A long time ago I used Numeric+python to do exactly this sort of calculation
with excellent results and the matrices were as large or larger eg 100 x 100 and
above. I don't think the underlying numeric routines have changed that much. If
your matrix is symmetric then you should certainly be using

Even if you can't post the matrix, perhaps you should indicate how you proceed
from data to matrix. Another problem is that a large determinant is no guarantee
of stability for the inversion. If the largest eigenvalue is 10**100 and the
smallest 10**-200 I probably have an ill determined problem; surprisingly easy
to achieve
--
Robin Becker

Robin Becker, Apr 4, 2007

3. ### Robin BeckerGuest

Robin Becker wrote:
> lancered wrote:

h. If
> your matrix is symmetric then you should certainly be using

.....

a qr decomposition I meant to say
--
Robin Becker

Robin Becker, Apr 4, 2007
4. ### lanceredGuest

Here is the eigenvalues of KK I obtained:

>>> linalg.eigvals(KK)

array([ 1.11748411e+05, 3.67154458e+04, 3.41580846e+04,
2.75272440e+04, 2.09790868e+04, 1.86242332e+04,
8.68628325e+03, 6.66127732e+03, 6.15547187e+03,
4.68626197e+03, 3.17838339e+03, 2.84888045e+03,
1.88279736e+03, 1.32427574e+03, 1.04946287e+03,
5.79303171e+02, 3.83111876e+02, 4.93826556e-12,
1.50263232e-12])

You are right. The ratio of max/min eigenvalues is 7.4368432669e+016
Maybe this exceed the of precision of my machine?

Is there any tricks for me to be able to deal with this matrix
correctly with
NumPy?

On Apr 4, 3:58 pm, Robin Becker <> wrote:
> lancered wrote:
> > Hi dear all,

> ..........
> > matrices are correct.

>
> > So, can you tell me what goes wrong? Is this a bug in
> > Numpy.linalg? How to deal with this situation? If you need, I can
> > post the matrix I used below, but it is so long,so not at the moment.

>
> .......
>
> presumably the matrix KK is actually some kind of normal matrix obtained from
> the data. So you have say n variables and m observations the data matrix is than
> an n x m real valued thing say D then you want the inverse of something like D'D
> ie an n by n thing. Typically the data D is de-meaned and normalized by the
> column norms so that you end up with a fairly well scaled problem.
>
> A long time ago I used Numeric+python to do exactly this sort of calculation
> with excellent results and the matrices were as large or larger eg 100 x 100 and
> above. I don't think the underlying numeric routines have changed that much. If
> your matrix is symmetric then you should certainly be using
>
> Even if you can't post the matrix, perhaps you should indicate how you proceed
> from data to matrix. Another problem is that a large determinant is no guarantee
> of stability for the inversion. If the largest eigenvalue is 10**100 and the
> smallest 10**-200 I probably have an ill determined problem; surprisingly easy
> to achieve
> --
> Robin Becker

lancered, Apr 4, 2007
5. ### Robin BeckerGuest

lancered wrote:
> Here is the eigenvalues of KK I obtained:
>
> >>> linalg.eigvals(KK)

> array([ 1.11748411e+05, 3.67154458e+04, 3.41580846e+04,
> 2.75272440e+04, 2.09790868e+04, 1.86242332e+04,
> 8.68628325e+03, 6.66127732e+03, 6.15547187e+03,
> 4.68626197e+03, 3.17838339e+03, 2.84888045e+03,
> 1.88279736e+03, 1.32427574e+03, 1.04946287e+03,
> 5.79303171e+02, 3.83111876e+02, 4.93826556e-12,
> 1.50263232e-12])
>
> You are right. The ratio of max/min eigenvalues is 7.4368432669e+016
> Maybe this exceed the of precision of my machine?
>
> Is there any tricks for me to be able to deal with this matrix
> correctly with
> NumPy?
>

........

You have to be very careful with a set of eigenvalues like the above. The
implication is that you have a rank deficiency of two ie either you have to
small a number of observations or there's something fishy about the model (two
of the data matrix columns can be written as linear combinations of the others
or two of the columns are structurally zero). If you really believe the data to
be good then normalize to unit column length first ie try using

[d(1)/||d(1)||.....d(n)/||d(n)||]

and see if the eigenvalues are still zero (this will only be of benefit if the
data is vastly different in scale).

I often got bad results, but that was mostly bad programming and or measuring
the voltage of the earth wire by mistake . Some problems naturally have a zero
at the origin.
--
Robin Becker

Robin Becker, Apr 4, 2007
6. ### Lou PecoraGuest

In article <>,
"lancered" <> wrote:

> Here is the eigenvalues of KK I obtained:
>
> >>> linalg.eigvals(KK)

> array([ 1.11748411e+05, 3.67154458e+04, 3.41580846e+04,
> 2.75272440e+04, 2.09790868e+04, 1.86242332e+04,
> 8.68628325e+03, 6.66127732e+03, 6.15547187e+03,
> 4.68626197e+03, 3.17838339e+03, 2.84888045e+03,
> 1.88279736e+03, 1.32427574e+03, 1.04946287e+03,
> 5.79303171e+02, 3.83111876e+02, 4.93826556e-12,
> 1.50263232e-12])
>
> You are right. The ratio of max/min eigenvalues is 7.4368432669e+016
> Maybe this exceed the of precision of my machine?
>
> Is there any tricks for me to be able to deal with this matrix
> correctly with
> NumPy?

That sounds large to me, too. Close to the floating point accuracy.
The problem is not with NumPy,but with double precision numbers. No
routine can save you if the condition number is large. However, several
people here have noted that you might be able to solve your problem by
avoiding inverting the matrix in the first place. In other words,
depending on your particular problem, there may be other ways to solve
it beside brute force inversion. Can you Use a QR or SVD approach?

-- Lou Pecora (my views are my own) REMOVE THIS to email me.

Lou Pecora, Apr 4, 2007
7. ### Travis OliphantGuest

lancered wrote:
>
>
> So, can you tell me what goes wrong? Is this a bug in
> Numpy.linalg? How to deal with this situation? If you need, I can
> post the matrix I used below, but it is so long,so not at the moment.
>

As you discovered, it is very likely your problem is a very high
condition number.

The easiest thing to do is to use

numpy.linalg.pinv

to perform a pseudo-inverse which will only use the singular-values that
are well-conditioned to compute the inverse.

This will still not give you an exact identity, but at least you will
know you aren't amplifiying low-valued singular vectors.

-Travis

Travis Oliphant, Apr 4, 2007