ScientificPython - LeastSquareFit diverges

H

Harold Fellermann

Dear all,

I am trying to fit a powerlaw to a small dataset using
Scientific.Functions.LeastSquares fit.
Unfortunately, the algorithm seems to diverge and throws an
OverflowException.
Here is how I try it:
from Scientific.Functions.LeastSquares import leastSquaresFit

data = [
.... (2.5, 589.0, 0.10000000000000001),
.... (7.5, 442.0, 0.10000000000000001),
.... (12.5, 96.0, 0.10000000000000001),
.... (17.5, 36.0, 0.10000000000000001),
.... (22.5, 16.0, 0.10000000000000001),
.... (27.5, 7.0, 0.10000000000000001),
.... (32.5, 6.0, 0.10000000000000001),
.... (37.5, 3.0, 0.10000000000000001),
.... (42.5, 3.0, 0.10000000000000001),
.... (47.5, 1.0, 0.10000000000000001),
.... (52.5, 3.0, 0.10000000000000001),
.... (57.5, 1.0, 0.10000000000000001),
.... (67.5, 1.0, 0.10000000000000001),
.... (77.5, 2.0, 0.10000000000000001),
.... (82.5, 1.0, 0.10000000000000001),
.... (87.5, 2.0, 0.10000000000000001)
.... ].... return a*x**b
....Traceback (most recent call last):
File "<stdin>", line 1, in ?
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/LeastSquares.py",
line 72, in leastSquaresFit
next_chi_sq, next_alpha = _chiSquare(model, next_p, data)
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/LeastSquares.py",
line 22, in _chiSquare
f = model(parameters, point[0])
File "<stdin>", line 2, in powerlaw
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/FirstDerivatives.py",
line 182, in __rpow__
return pow(other, self)
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/FirstDerivatives.py",
line 171, in __pow__
raise OverflowError, "Numerical result of pow(%s,%s) out of range."
% (self.value,other.value-1)
OverflowError: Numerical result of pow(2.5,8376.79243687) out of range.

I added some debugging output in
/usr/lib/python-2.4/site-packages/Scientifc/Functions/LeastSquares.py
in the function _chiSquare that prints the fit parameters during the
Levenberg-Marquardt iteration.
The procedure seems do diverge after the first step:

((10, [1]), (-3, [0, 1]))
[(-67402.311817579117, [1]), (8377.7924368716158, [0, 1])]

Note that I could easily fit the above data using gnuplots internal
fitting procedure. Any idea what is going wrong here? Is it a known
problem? Are there any work arounds or other packages?

Any help is appreciated!

- harold -
 
T

Terry Reedy

Harold Fellermann said:
I am trying to fit a powerlaw to a small dataset using
Scientific.Functions.LeastSquares fit.

This is a bit off-topic here, and normally better for the scipy list, but I
have some experience with nonlinear least squares.
Unfortunately, the algorithm seems to diverge and throws an
OverflowException.

Assuming the program is okay, this means that either the function
mismatches the data or the initial values are too far off to converge.
Here is how I try it:
from Scientific.Functions.LeastSquares import leastSquaresFit

data = [
... (2.5, 589.0, 0.10000000000000001),
... (7.5, 442.0, 0.10000000000000001),
... (12.5, 96.0, 0.10000000000000001),

I presume that tuples are x, y, some_error_indicator. But the last does
not matter here.
... return a*x**b

Did you try plotting logx versus log y to see if you get approximately a
straight line? If so, the intercept and slope are estimates of loga and b.

I presume (10,-3) is the starting (a,b). But, for instance 10*7.5**-3 =
..02, which has no relation to 442, whereas, for instance, 1000*7.5-.75 =
221, which is in the ballpark, at least. So (a,b)=(1000, -.75) might have
a chance.

Terry Jan Reedy
 
K

konrad.hinsen

... return a*x**b

Fitting power laws is a tricky business, you need a pretty good
initial guess to get convergence.
Note that I could easily fit the above data using gnuplots internal
fitting procedure. Any idea what is going wrong here? Is it a known
problem? Are there any work arounds or other packages?

My suggestion is to fit, at least as a first step, the logarithms of
your data points:

import Numeric as N

def powerlaw_log((a, b), x) :
return N.log(a) + b*N.log(x)

params1, chisq = leastSquaresFit(powerlaw_log, (10., -3.),
[(x, N.log(y)) for x, y, sigma in
data])


You can then use those parameters as starting values for fitting your
original problem:

params2, chisq = leastSquaresFit(powerlaw, params1, data)

Doing this for your data yields:

params1: [9469.9675999067185, -2.0881423620750521]

params2: [1591.4025775162165, -1.0112284948049179]

The big difference between the two fits is a further indicator for a
stability problem. I would trust the first set more than the second one.

As a general rule, the model to be fitted should be a smoothly
varying function of the parameters, and the same should be true for
the derivatives.

The second general rule is never to trust a non-linear fit algorithm
blindly. Look at your data first, see if the model can be a good fit,
and play with some paramater values to get a feeling for how they
influence the fit. Plotting your data set, it is immediately clear
that the first point ruins any nice power law behaviour. You might
thus prefer to do the fit without the first point, and you will get a
much better defined exponent:

params1: [31363.301954929859, -2.4047303053979046]
params2: [182522.2346197216, -2.9893640209815757]

Plotting the models corresponding to these two sets together with the
data, you will see that everything coincides well for large x values,
meaning that the first two points make all the difference - another
pointer towards a lack of stability in the fit.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: (e-mail address removed)
---------------------------------------------------------------------
 
H

Harold Fellermann

Thanks for your advices, Terry and Konrad,

using the linear fit as initial condition for the pawerlow fit works
pretty well for my data.
(I already had the two calculations but performed them vice versa ...
:) Anyway, I had
the impression that the leastSquaresFit in Scientific Python is an
implementation of
the Levenberg Marquardt algorithm as it is presented in the Numerical
Recipes. Accoring
to reviews, this algorithm is not famous for its stability
(e.g. http://www.stanford.edu/class/cme302/wnnr/nr.html). Better
implementations
are out there (e.g. http://www.ics.forth.gr/~lourakis/levmar/). Are
there any plans to
improve the SciPy algorithm? Would it be a welcome contribution to
SciPy to work
this part out?

- harold -
 
K

Konrad Hinsen

:) Anyway, I had
the impression that the leastSquaresFit in Scientific Python is an
implementation of
the Levenberg Marquardt algorithm as it is presented in the Numerical
Recipes.
True.

Accoring
to reviews, this algorithm is not famous for its stability
(e.g. http://www.stanford.edu/class/cme302/wnnr/nr.html). Better
implementations
are out there (e.g. http://www.ics.forth.gr/~lourakis/levmar/). Are
there any plans to
improve the SciPy algorithm? Would it be a welcome contribution to
SciPy to work
this part out?

Yes, definitely. And no, I have no plans to do it myself any time
soon. The current implementation has always been sufficient for my
needs, and time is scarce...

BTW, ScientificPython (http://dirac.cnrs-orleans.fr/
ScientificPython/) is not the same thing as SciPy (http://
www.scipy.org/). Both are scientific libraries for Python, but their
focus is different: ScientificPython aims at providing pythonic
modules for scientific computing, whereas SciPy's objective is to
provide Python interfaces to the large pool of scientific libraries
from the Fortran/C/C++ world. Scientific users of Python should
probably have both of them installed.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: (e-mail address removed)
---------------------------------------------------------------------
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top