# Re: Linear regression in 3 dimensions

Discussion in 'Python' started by Robert Kern, Sep 2, 2006.

1. ### Robert KernGuest

wrote:
> Hi all,
>
> I am seeking a module that will do the equivalent of linear regression in
> 3D to yield a best fit a plane through a set of points (X1, Y1, Z1), (X1,
> Y1, Z1),... (Xn, Yn, Zn).
>
> The resulting equation to be of the form:
>
> Z = aX + bY + c
>
> The function I need would take the set of points and return a,c & c Any
> pointers to existing code / modules would be very helpful.

Well, that's a very unspecified problem. You haven't defined "best."

But if we make the assumption that you want to minimize the squared error in Z,
that is minimize

Sum((Z - (a*X + b*Y + c)) ** 2)

then this is a standard linear algebra problem.

In [1]: import numpy as np

In [2]: a = 1.0

In [3]: b = 2.0

In [4]: c = 3.0

In [5]: rs = np.random.RandomState(1234567890) # Specify a seed for repeatability

In [6]: x = rs.uniform(size=100)

In [7]: y = rs.uniform(size=100)

In [8]: e = rs.standard_normal(size=100)

In [9]: z = a*x + b*y + c + e

In [10]: A = np.column_stack([x, y, np.ones_like(x)])

In [11]: np.linalg.lstsq?
Type: function
Base Class: <type 'function'>
String Form: <function lstsq at 0x6df070>
Namespace: Interactive
File:
/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy-1.0b2.dev3002-py2.4-macosx-10.4-ppc.egg/numpy/linalg/linalg.py
Definition: np.linalg.lstsq(a, b, rcond=1e-10)
Docstring:
returns x,resids,rank,s
where x minimizes 2-norm(|b - Ax|)
resids is the sum square residuals
rank is the rank of A
s is the rank of the singular values of A in descending order

If b is a matrix then x is also a matrix with corresponding columns.
If the rank of A is less than the number of columns of A or greater than
the number of rows, then residuals will be returned as an empty array
otherwise resids = sum((b-dot(A,x)**2).
Singular values less than s[0]*rcond are treated as zero.

In [12]: abc, residuals, rank, s = np.linalg.lstsq(A, z)

In [13]: abc
Out[13]: array([ 0.93104714, 1.96780364, 3.15185125])

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco
Robert Kern, Sep 2, 2006

2. ### Guest

Hi Robert,

I'm using the scipy package for such problems. In the submodule
scipy.optimize there is an implmentation of a least-square fitting
algorithm (Levenberg-Marquardt) called leastsq.

You have to define a function that computes the residuals between your
model and the data points:

import scipy.optimize

def model(parameter, x, y):
a, b, c = parameter
return a*x + b*y + c

def residual(parameter, data, x, y):
res = []
for _x in x:
for _y in y:
res.append(data-model(parameter,x,y)
return res

params0 = [1., 1.,1.]
result = scipy.optimize.leastsq(resdiual, params0, (data,x,y))
fittedParams = result[0]

If you haven't used numeric, numpy or scipy before, you should take a
look at an introduction. It uses some nice extended array objects,
where you can use some neat index tricks and compute values of array
items without looping through it.

Cheers! Bernhard

Robert Kern wrote:
> wrote:
> > Hi all,
> >
> > I am seeking a module that will do the equivalent of linear regression in
> > 3D to yield a best fit a plane through a set of points (X1, Y1, Z1), (X1,
> > Y1, Z1),... (Xn, Yn, Zn).
> >
> > The resulting equation to be of the form:
> >
> > Z = aX + bY + c
> >
> > The function I need would take the set of points and return a,c & c Any
> > pointers to existing code / modules would be very helpful.

>
> Well, that's a very unspecified problem. You haven't defined "best."
>
> But if we make the assumption that you want to minimize the squared error in Z,
> that is minimize
>
> Sum((Z - (a*X + b*Y + c)) ** 2)
>
> then this is a standard linear algebra problem.
>
> In [1]: import numpy as np
>
> In [2]: a = 1.0
>
> In [3]: b = 2.0
>
> In [4]: c = 3.0
>
> In [5]: rs = np.random.RandomState(1234567890) # Specify a seed for repeatability
>
> In [6]: x = rs.uniform(size=100)
>
> In [7]: y = rs.uniform(size=100)
>
> In [8]: e = rs.standard_normal(size=100)
>
> In [9]: z = a*x + b*y + c + e
>
> In [10]: A = np.column_stack([x, y, np.ones_like(x)])
>
> In [11]: np.linalg.lstsq?
> Type: function
> Base Class: <type 'function'>
> String Form: <function lstsq at 0x6df070>
> Namespace: Interactive
> File:
> /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy-1.0b2.dev3002-py2.4-macosx-10.4-ppc.egg/numpy/linalg/linalg.py
> Definition: np.linalg.lstsq(a, b, rcond=1e-10)
> Docstring:
> returns x,resids,rank,s
> where x minimizes 2-norm(|b - Ax|)
> resids is the sum square residuals
> rank is the rank of A
> s is the rank of the singular values of A in descending order
>
> If b is a matrix then x is also a matrix with corresponding columns.
> If the rank of A is less than the number of columns of A or greater than
> the number of rows, then residuals will be returned as an empty array
> otherwise resids = sum((b-dot(A,x)**2).
> Singular values less than s[0]*rcond are treated as zero.
>
>
> In [12]: abc, residuals, rank, s = np.linalg.lstsq(A, z)
>
> In [13]: abc
> Out[13]: array([ 0.93104714, 1.96780364, 3.15185125])
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless enigma
> that is made terrible by our own mad attempt to interpret it as though it had
> an underlying truth."
> -- Umberto Eco
, Sep 4, 2006

3. ### Andrew McLeanGuest

Bernhard,

Levenberg-Marquardt is a good solution when you want to solve a general
non-linear least-squares problem. As Robert said, the OPs problem is
linear and Robert's solution exploits that. Using LM here is unnecessary
and I suspect a fair bit less efficient (i.e. slower).

- Andrew

wrote:
> Hi Robert,
>
> I'm using the scipy package for such problems. In the submodule
> scipy.optimize there is an implmentation of a least-square fitting
> algorithm (Levenberg-Marquardt) called leastsq.
>
> You have to define a function that computes the residuals between your
> model and the data points:
>
> import scipy.optimize
>
> def model(parameter, x, y):
> a, b, c = parameter
> return a*x + b*y + c
>
> def residual(parameter, data, x, y):
> res = []
> for _x in x:
> for _y in y:
> res.append(data-model(parameter,x,y)
> return res
>
> params0 = [1., 1.,1.]
> result = scipy.optimize.leastsq(resdiual, params0, (data,x,y))
> fittedParams = result[0]
>
> If you haven't used numeric, numpy or scipy before, you should take a
> look at an introduction. It uses some nice extended array objects,
> where you can use some neat index tricks and compute values of array
> items without looping through it.
>
> Cheers! Bernhard
>
>
>
> Robert Kern wrote:
>> wrote:
>>> Hi all,
>>>
>>> I am seeking a module that will do the equivalent of linear regression in
>>> 3D to yield a best fit a plane through a set of points (X1, Y1, Z1), (X1,
>>> Y1, Z1),... (Xn, Yn, Zn).
>>>
>>> The resulting equation to be of the form:
>>>
>>> Z = aX + bY + c
>>>
>>> The function I need would take the set of points and return a,c & c Any
>>> pointers to existing code / modules would be very helpful.

>> Well, that's a very unspecified problem. You haven't defined "best."
>>
>> But if we make the assumption that you want to minimize the squared error in Z,
>> that is minimize
>>
>> Sum((Z - (a*X + b*Y + c)) ** 2)
>>
>> then this is a standard linear algebra problem.
>>
>> In [1]: import numpy as np
>>
>> In [2]: a = 1.0
>>
>> In [3]: b = 2.0
>>
>> In [4]: c = 3.0
>>
>> In [5]: rs = np.random.RandomState(1234567890) # Specify a seed for repeatability
>>
>> In [6]: x = rs.uniform(size=100)
>>
>> In [7]: y = rs.uniform(size=100)
>>
>> In [8]: e = rs.standard_normal(size=100)
>>
>> In [9]: z = a*x + b*y + c + e
>>
>> In [10]: A = np.column_stack([x, y, np.ones_like(x)])
>>
>> In [11]: np.linalg.lstsq?
>> Type: function
>> Base Class: <type 'function'>
>> String Form: <function lstsq at 0x6df070>
>> Namespace: Interactive
>> File:
>> /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy-1.0b2.dev3002-py2.4-macosx-10.4-ppc.egg/numpy/linalg/linalg.py
>> Definition: np.linalg.lstsq(a, b, rcond=1e-10)
>> Docstring:
>> returns x,resids,rank,s
>> where x minimizes 2-norm(|b - Ax|)
>> resids is the sum square residuals
>> rank is the rank of A
>> s is the rank of the singular values of A in descending order
>>
>> If b is a matrix then x is also a matrix with corresponding columns.
>> If the rank of A is less than the number of columns of A or greater than
>> the number of rows, then residuals will be returned as an empty array
>> otherwise resids = sum((b-dot(A,x)**2).
>> Singular values less than s[0]*rcond are treated as zero.
>>
>>
>> In [12]: abc, residuals, rank, s = np.linalg.lstsq(A, z)
>>
>> In [13]: abc
>> Out[13]: array([ 0.93104714, 1.96780364, 3.15185125])
>>
>> --
>> Robert Kern
>>
>> "I have come to believe that the whole world is an enigma, a harmless enigma
>> that is made terrible by our own mad attempt to interpret it as though it had
>> an underlying truth."
>> -- Umberto Eco

>
Andrew McLean, Sep 8, 2006

Andrew McLean wrote:
> Bernhard,
>
> Levenberg-Marquardt is a good solution when you want to solve a general
> non-linear least-squares problem. As Robert said, the OPs problem is
> linear and Robert's solution exploits that. Using LM here is unnecessary
> and I suspect a fair bit less efficient (i.e. slower).
>
> - Andrew
>
>
> wrote:
>> Hi Robert,
>>
>> I'm using the scipy package for such problems. In the submodule
>> scipy.optimize there is an implmentation of a least-square fitting
>> algorithm (Levenberg-Marquardt) called leastsq.
>>
>> You have to define a function that computes the residuals between your
>> model and the data points:
>>
>> import scipy.optimize
>>
>> def model(parameter, x, y):
>> a, b, c = parameter
>> return a*x + b*y + c
>>
>> def residual(parameter, data, x, y):
>> res = []
>> for _x in x:
>> for _y in y:
>> res.append(data-model(parameter,x,y)
>> return res
>>
>> params0 = [1., 1.,1.]
>> result = scipy.optimize.leastsq(resdiual, params0, (data,x,y))
>> fittedParams = result[0]
>>
>> If you haven't used numeric, numpy or scipy before, you should take a
>> look at an introduction. It uses some nice extended array objects,
>> where you can use some neat index tricks and compute values of array
>> items without looping through it.
>>
>> Cheers! Bernhard
>>

<<snip>>

Hi Bernhard,
I am just starting to learn Python; could you plz tell me specifically
the introduction(s) you have in mind?

TIA,
DaveB
David J. Braden, Sep 14, 2006
5. ### Guest

Hi Dave!

> <<snip>>
>
> Hi Bernhard,
> I am just starting to learn Python; could you plz tell me specifically
> the introduction(s) you have in mind?
>
> TIA,
> DaveB

Take a look at the documenation section on www.scipy.org.

and the scipy tutorial (there's an 'old' version as pdf) were helpfull.

There is also a fee based e-book on NumPy which I should buy to support
the development, because the package is so great

For matplotlib take a look at http://matplotlib.sourceforge.net/ and
get the user's guide.

Enjoy! Bernhard
, Sep 14, 2006

wrote:
> Hi Dave!
>
>> <<snip>>
>>
>> Hi Bernhard,
>> I am just starting to learn Python; could you plz tell me specifically
>> the introduction(s) you have in mind?
>>
>> TIA,
>> DaveB

>
> Take a look at the documenation section on www.scipy.org.
>
> and the scipy tutorial (there's an 'old' version as pdf) were helpfull.
>
> There is also a fee based e-book on NumPy which I should buy to support
> the development, because the package is so great
>
> For matplotlib take a look at http://matplotlib.sourceforge.net/ and
> get the user's guide.
>
> Enjoy! Bernhard
>

Whoa, your suggestions are great! I used to use MatLab, so your 3rd
suggestion particularly hits home. With these packages and the nature of
Python, its starting to feel a bit like S (well, in my case, R).

Thanks mucho (oder sollte ich vielen Dank sagen?). Regards,
DaveB
David J. Braden, Sep 15, 2006