best way(s) to fit a polynomial to points?

K

Kyler Laird

I'm trying to do some georeferencing - using points of known
location (ground control points, GCPs) on an image to develop
a polynomial that can be used to approximate the locations of
other points.

I usually use the Python bindings to GDAL
http://www.remotesensing.org/gdal/
to manipulate geospatial data but there are no Python bindings
for GDALCreateGCPTransformer().
http://remotesensing.org/cgi-bin/cv...dal_crs.c?rev=1.4.2.1&content-type=text/plain
I tried using it from C (both through GDAL and directly to
the GRASS code) and failed.

I'd rather have a Python-based solution anyway, so I'm
searching for appropriate tools. The ones I'm finding (like
matplotlib.mlab.polyfit() and PyBLD's polfit()) only deal with
single variable equations (f(x) = X). I need something that
operates on two variables (f(x, y) = X). (I'd have two
polynomials; one would calculate the northing and the other
would do the easting). I need to be able to fit at least
third degree polynomials.

The solution does not need to be especially fast. A pure
Python solution is preferred but I'll take about anything
that works right now. (I'm trying to show someone how to do
this. I've only used proprietary software to do it but I've
been meaning to write my own.)

Any suggestions?

Thank you.

--kyler
 
C

Cameron Laird

I'm trying to do some georeferencing - using points of known
location (ground control points, GCPs) on an image to develop
a polynomial that can be used to approximate the locations of
other points.

I usually use the Python bindings to GDAL
http://www.remotesensing.org/gdal/
to manipulate geospatial data but there are no Python bindings
for GDALCreateGCPTransformer().
http://remotesensing.org/cgi-bin/cv...dal_crs.c?rev=1.4.2.1&content-type=text/plain
I tried using it from C (both through GDAL and directly to
the GRASS code) and failed.

I'd rather have a Python-based solution anyway, so I'm
searching for appropriate tools. The ones I'm finding (like
matplotlib.mlab.polyfit() and PyBLD's polfit()) only deal with
single variable equations (f(x) = X). I need something that
operates on two variables (f(x, y) = X). (I'd have two
polynomials; one would calculate the northing and the other
would do the easting). I need to be able to fit at least
third degree polynomials.

The solution does not need to be especially fast. A pure
Python solution is preferred but I'll take about anything
that works right now. (I'm trying to show someone how to do
this. I've only used proprietary software to do it but I've
been meaning to write my own.)
.
.
.
"I tried using it ... and failed." I naturally wonder *how*
you (it, more likely) failed--the specific details of what
you actually observed. I can imagine, though, that if you
had the time to understand and describe what happened with
precision, it'd be easier fixing it than talking about it.
We can live this part aside for now.

Curve-fitting is a domain where a lot of subtlety accompanies
the transition from one to two dimensions. Let me repeat in
my own words some of what you've said, to be certain I under-
stand: you have a table of observations, in columns

x y X

with a large number of rows. You have reason to believe that
the table can be adequately modeled by a polynomial of low
degree in x and y, jointly, plus small non-systematic errors.
The use of this polynomial will be to estimate new X-s from
new values of x and y.

Will the new (x,y)-s be interior to the convex hull formed by
the tabulated ones, or might they be outside?

I don't know PyBLD and matplotlib. It's *very* likely, though,
that there's a simple way to restate the problem so that the
least-squares' solvers they must embed can be applied to
simply transformed data to yield polynomial coefficients. How
soon does this thing need to go live?
 
K

Kyler Laird

"I tried using it ... and failed." I naturally wonder *how*
you (it, more likely) failed--the specific details of what
you actually observed.

I was generating the test points with a simple algorithm. I
finally discovered that the fitting algorithm fails if the
points are (close to being) colinear.
I can imagine, though, that if you
had the time to understand and describe what happened with
precision, it'd be easier fixing it than talking about it.

I'll append the code that I successfully used.
Will the new (x,y)-s be interior to the convex hull formed by
the tabulated ones, or might they be outside?

It's good practice to make them interior (have ground control
points around the perimeter of the region of interest) but it
isn't always the case. The solution should be general enough
to work for outside points but it's understood that the error
for them is likely to be high.
How
soon does this thing need to go live?

It's something I meant to do during my Remote Sensing class
last semester and then someone asked about it in another news
group recently. It's not urgent. I just wanted to solve it.

(I have some aerial photos that I georeferenced using ERDAS
Imagine but I want to use my own code so that I can do more
with them.)

I'm happy with the C code, but I'd still prefer a Python
solution.

--kyler

==============================================================

#include <stdio.h>

struct Control_Points
{
int count;
double *e1;
double *n1;
double *e2;
double *n2;
int *status;
};


/* STRUCTURE FOR USE INTERNALLY WITH THESE FUNCTIONS. THESE FUNCTIONS EXPECT
SQUARE MATRICES SO ONLY ONE VARIABLE IS GIVEN (N) FOR THE MATRIX SIZE */

struct MATRIX
{
int n; /* SIZE OF THIS MATRIX (N x N) */
double *v;
};

/* CALCULATE OFFSET INTO ARRAY BASED ON R/C */

#define M(row,col) m->v[(((row)-1)*(m->n))+(col)-1]


#define MSUCCESS 1 /* SUCCESS */
#define MNPTERR 0 /* NOT ENOUGH POINTS */
#define MUNSOLVABLE -1 /* NOT SOLVABLE */
#define MMEMERR -2 /* NOT ENOUGH MEMORY */
#define MPARMERR -3 /* PARAMETER ERROR */
#define MINTERR -4 /* INTERNAL ERROR */




#define MAXORDER 3

#define CPLFree free
#define CPLCalloc calloc


/* CALCULATE OFFSET INTO ARRAY BASED ON R/C */

#define M(row,col) m->v[(((row)-1)*(m->n))+(col)-1]


#define MSUCCESS 1 /* SUCCESS */
#define MNPTERR 0 /* NOT ENOUGH POINTS */
#define MUNSOLVABLE -1 /* NOT SOLVABLE */
#define MMEMERR -2 /* NOT ENOUGH MEMORY */
#define MPARMERR -3 /* PARAMETER ERROR */
#define MINTERR -4 /* INTERNAL ERROR */

/***************************************************************************/
/*
FUNCTION PROTOTYPES FOR STATIC (INTERNAL) FUNCTIONS
*/
/***************************************************************************/

static int calccoef(struct Control_Points *,double *,double *,int);
static int calcls(struct Control_Points *,struct MATRIX *,
double *,double *,double *,double *);
static int exactdet(struct Control_Points *,struct MATRIX *,
double *,double *,double *,double *);
static int solvemat(struct MATRIX *,double *,double *,double *,double *);
static double term(int,double,double);

/***************************************************************************/
/*
TRANSFORM A SINGLE COORDINATE PAIR.
*/
/***************************************************************************/

static int
CRS_georef (
double e1, /* EASTINGS TO BE TRANSFORMED */
double n1, /* NORTHINGS TO BE TRANSFORMED */
double *e, /* EASTINGS TO BE TRANSFORMED */
double *n, /* NORTHINGS TO BE TRANSFORMED */
double E[], /* EASTING COEFFICIENTS */
double N[], /* NORTHING COEFFICIENTS */
int order /* ORDER OF TRANSFORMATION TO BE PERFORMED, MUST MATCH THE
ORDER USED TO CALCULATE THE COEFFICIENTS */
)
{
double e3, e2n, en2, n3, e2, en, n2;

switch(order)
{
case 1:

*e = E[0] + E[1] * e1 + E[2] * n1;
*n = N[0] + N[1] * e1 + N[2] * n1;
break;

case 2:

e2 = e1 * e1;
n2 = n1 * n1;
en = e1 * n1;

*e = E[0] + E[1] * e1 + E[2] * n1 +
E[3] * e2 + E[4] * en + E[5] * n2;
*n = N[0] + N[1] * e1 + N[2] * n1 +
N[3] * e2 + N[4] * en + N[5] * n2;
break;

case 3:

e2 = e1 * e1;
en = e1 * n1;
n2 = n1 * n1;
e3 = e1 * e2;
e2n = e2 * n1;
en2 = e1 * n2;
n3 = n1 * n2;

*e = E[0] +
E[1] * e1 + E[2] * n1 +
E[3] * e2 + E[4] * en + E[5] * n2 +
E[6] * e3 + E[7] * e2n + E[8] * en2 + E[9] * n3;
*n = N[0] +
N[1] * e1 + N[2] * n1 +
N[3] * e2 + N[4] * en + N[5] * n2 +
N[6] * e3 + N[7] * e2n + N[8] * en2 + N[9] * n3;
break;

default:

return(MPARMERR);
break;
}

return(MSUCCESS);
}

/***************************************************************************/
/*
COMPUTE THE GEOREFFERENCING COEFFICIENTS BASED ON A SET OF CONTROL POINTS
*/
/***************************************************************************/

static int
CRS_compute_georef_equations (struct Control_Points *cp,
double E12[], double N12[],
double E21[], double N21[],
int order)
{
double *tempptr;
int status;

if(order < 1 || order > MAXORDER)
return(MPARMERR);

/* CALCULATE THE FORWARD TRANSFORMATION COEFFICIENTS */

status = calccoef(cp,E12,N12,order);

if(status != MSUCCESS)
return(status);

/* SWITCH THE 1 AND 2 EASTING AND NORTHING ARRAYS */

tempptr = cp->e1;
cp->e1 = cp->e2;
cp->e2 = tempptr;
tempptr = cp->n1;
cp->n1 = cp->n2;
cp->n2 = tempptr;

/* CALCULATE THE BACKWARD TRANSFORMATION COEFFICIENTS */

status = calccoef(cp,E21,N21,order);

/* SWITCH THE 1 AND 2 EASTING AND NORTHING ARRAYS BACK */

tempptr = cp->e1;
cp->e1 = cp->e2;
cp->e2 = tempptr;
tempptr = cp->n1;
cp->n1 = cp->n2;
cp->n2 = tempptr;

return(status);
}

/***************************************************************************/
/*
COMPUTE THE GEOREFFERENCING COEFFICIENTS BASED ON A SET OF CONTROL POINTS
*/
/***************************************************************************/

static int
calccoef (struct Control_Points *cp, double E[], double N[], int order)
{
struct MATRIX m;
double *a;
double *b;
int numactive; /* NUMBER OF ACTIVE CONTROL POINTS */
int status, i;

/* CALCULATE THE NUMBER OF VALID CONTROL POINTS */

for(i = numactive = 0 ; i < cp->count ; i++)
{
if(cp->status > 0)
numactive++;
}

/* CALCULATE THE MINIMUM NUMBER OF CONTROL POINTS NEEDED TO DETERMINE
A TRANSFORMATION OF THIS ORDER */

m.n = ((order + 1) * (order + 2)) / 2;


if(numactive < m.n)
return(MNPTERR);


/* INITIALIZE MATRIX */

m.v = (double *)CPLCalloc(m.n*m.n,sizeof(double));
if(m.v == NULL)
{
return(MMEMERR);
}
a = (double *)CPLCalloc(m.n,sizeof(double));
if(a == NULL)
{
CPLFree((char *)m.v);
return(MMEMERR);
}
b = (double *)CPLCalloc(m.n,sizeof(double));
if(b == NULL)
{
CPLFree((char *)m.v);
CPLFree((char *)a);
return(MMEMERR);
}


if(numactive == m.n)
status = exactdet(cp,&m,a,b,E,N);
else
status = calcls(cp,&m,a,b,E,N);

/*
CPLFree((char *)m.v);
CPLFree((char *)a);
CPLFree((char *)b);
*/

return(status);
}

/***************************************************************************/
/*
CALCULATE THE TRANSFORMATION COEFFICIENTS WITH EXACTLY THE MINIMUM
NUMBER OF CONTROL POINTS REQUIRED FOR THIS TRANSFORMATION.
*/
/***************************************************************************/

static int exactdet (
struct Control_Points *cp,
struct MATRIX *m,
double a[],
double b[],
double E[], /* EASTING COEFFICIENTS */
double N[] /* NORTHING COEFFICIENTS */
) {
int pntnow, currow, j;



currow = 1;
for(pntnow = 0 ; pntnow < cp->count ; pntnow++) {

if(cp->status[pntnow] > 0) {
/* POPULATE MATRIX M */

for(j = 1 ; j <= m->n ; j++) {
M(currow,j) = term(j,cp->e1[pntnow],cp->n1[pntnow]);
}

/* POPULATE MATRIX A AND B */

a[currow-1] = cp->e2[pntnow];
b[currow-1] = cp->n2[pntnow];

currow++;
}
}

if(currow - 1 != m->n) {
return(MINTERR);
}


return(solvemat(m,a,b,E,N));
}

/***************************************************************************/
/*
CALCULATE THE TRANSFORMATION COEFFICIENTS WITH MORE THAN THE MINIMUM
NUMBER OF CONTROL POINTS REQUIRED FOR THIS TRANSFORMATION. THIS
ROUTINE USES THE LEAST SQUARES METHOD TO COMPUTE THE COEFFICIENTS.
*/
/***************************************************************************/

static int calcls (
struct Control_Points *cp,
struct MATRIX *m,
double a[],
double b[],
double E[], /* EASTING COEFFICIENTS */
double N[] /* NORTHING COEFFICIENTS */
)
{
int i, j, n, numactive = 0;

/* INITIALIZE THE UPPER HALF OF THE MATRIX AND THE TWO COLUMN VECTORS */

for(i = 1 ; i <= m->n ; i++)
{
for(j = i ; j <= m->n ; j++)
M(i,j) = 0.0;
a[i-1] = b[i-1] = 0.0;
}

/* SUM THE UPPER HALF OF THE MATRIX AND THE COLUMN VECTORS ACCORDING TO
THE LEAST SQUARES METHOD OF SOLVING OVER DETERMINED SYSTEMS */

for(n = 0 ; n < cp->count ; n++)
{
if(cp->status[n] > 0)
{
numactive++;
for(i = 1 ; i <= m->n ; i++)
{
for(j = i ; j <= m->n ; j++)
M(i,j) += term(i,cp->e1[n],cp->n1[n]) * term(j,cp->e1[n],cp->n1[n]);

a[i-1] += cp->e2[n] * term(i,cp->e1[n],cp->n1[n]);
b[i-1] += cp->n2[n] * term(i,cp->e1[n],cp->n1[n]);
}
}
}

if(numactive <= m->n)
return(MINTERR);

/* TRANSPOSE VALUES IN UPPER HALF OF M TO OTHER HALF */

for(i = 2 ; i <= m->n ; i++)
{
for(j = 1 ; j < i ; j++)
M(i,j) = M(j,i);
}

return(solvemat(m,a,b,E,N));
}

/***************************************************************************/
/*
CALCULATE THE X/Y TERM BASED ON THE TERM NUMBER

ORDER\TERM 1 2 3 4 5 6 7 8 9 10
1 e0n0 e1n0 e0n1
2 e0n0 e1n0 e0n1 e2n0 e1n1 e0n2
3 e0n0 e1n0 e0n1 e2n0 e1n1 e0n2 e3n0 e2n1 e1n2 e0n3
*/
/***************************************************************************/

static double term (int term, double e, double n)
{
switch(term)
{
case 1: return((double)1.0);
case 2: return((double)e);
case 3: return((double)n);
case 4: return((double)(e*e));
case 5: return((double)(e*n));
case 6: return((double)(n*n));
case 7: return((double)(e*e*e));
case 8: return((double)(e*e*n));
case 9: return((double)(e*n*n));
case 10: return((double)(n*n*n));
}
return((double)0.0);
}

/***************************************************************************/
/*
SOLVE FOR THE 'E' AND 'N' COEFFICIENTS BY USING A SOMEWHAT MODIFIED
GAUSSIAN ELIMINATION METHOD.

| M11 M12 ... M1n | | E0 | | a0 |
| M21 M22 ... M2n | | E1 | = | a1 |
| . . . . | | . | | . |
| Mn1 Mn2 ... Mnn | | En-1 | | an-1 |

and

| M11 M12 ... M1n | | N0 | | b0 |
| M21 M22 ... M2n | | N1 | = | b1 |
| . . . . | | . | | . |
| Mn1 Mn2 ... Mnn | | Nn-1 | | bn-1 |
*/
/***************************************************************************/

static int solvemat (struct MATRIX *m,
double a[], double b[], double E[], double N[])
{
int i, j, i2, j2, imark;
double factor, temp;
double pivot; /* ACTUAL VALUE OF THE LARGEST PIVOT CANDIDATE */


for(i = 1 ; i <= m->n ; i++)
{

j = i;

/* find row with largest magnitude value for pivot value */

pivot = M(i,j);
imark = i;
for(i2 = i + 1 ; i2 <= m->n ; i2++)
{
temp = fabs(M(i2,j));
if(temp > fabs(pivot))
{
pivot = M(i2,j);
imark = i2;
}
}

/* if the pivot is very small then the points are nearly co-linear */
/* co-linear points result in an undefined matrix, and nearly */
/* co-linear points results in a solution with rounding error */

if(pivot == 0.0) {
return(MUNSOLVABLE);
}

/* if row with highest pivot is not the current row, switch them */

if(imark != i)
{
for(j2 = 1 ; j2 <= m->n ; j2++)
{
temp = M(imark,j2);
M(imark,j2) = M(i,j2);
M(i,j2) = temp;
}

temp = a[imark-1];
a[imark-1] = a[i-1];
a[i-1] = temp;

temp = b[imark-1];
b[imark-1] = b[i-1];
b[i-1] = temp;
}

/* compute zeros above and below the pivot, and compute
values for the rest of the row as well */

for(i2 = 1 ; i2 <= m->n ; i2++)
{
if(i2 != i)
{
factor = M(i2,j) / pivot;
for(j2 = j ; j2 <= m->n ; j2++)
M(i2,j2) -= factor * M(i,j2);
a[i2-1] -= factor * a[i-1];
b[i2-1] -= factor * b[i-1];
}
}
}

/* SINCE ALL OTHER VALUES IN THE MATRIX ARE ZERO NOW, CALCULATE THE
COEFFICIENTS BY DIVIDING THE COLUMN VECTORS BY THE DIAGONAL VALUES. */

for(i = 1 ; i <= m->n ; i++)
{
E[i-1] = a[i-1] / M(i,i);
N[i-1] = b[i-1] / M(i,i);
}

return(MSUCCESS);
}


main() {
int i;
double E12[30], N12[30], E21[30], N21[30];
int order = 1;
int ret;
double x, y, X, Y;

struct Control_Points CPs;
CPs.count = 10;
CPs.e1 = (double *)malloc(sizeof(double) * CPs.count);
CPs.n1 = (double *)malloc(sizeof(double) * CPs.count);
CPs.e2 = (double *)malloc(sizeof(double) * CPs.count);
CPs.n2 = (double *)malloc(sizeof(double) * CPs.count);
CPs.status = (int *)malloc(sizeof(int) * CPs.count);

i = 0;

/* Set some GCPs. */
CPs.e1 = 0; CPs.n1 = 0; CPs.e2 = 0; CPs.n2 = 0; CPs.status = 1; i++;
CPs.e1 = 10; CPs.n1 = 30; CPs.e2 = 100; CPs.n2 = 200; CPs.status = 1; i++;
CPs.e1 = 15; CPs.n1 = 35; CPs.e2 = 150; CPs.n2 = 220; CPs.status = 1; i++;


for(i = 0; i < 30; i++) {
E12 = N12 = E21 = N21 = 0;
}

CRS_compute_georef_equations(
&CPs,
E12, N12,
E21, N21,
order
);

for(i = 0; i < CPs.count; i++) {
x = i * 100;
y = i * 200;
CRS_georef(x, y, &X, &Y, E21, N21, order);
printf("(%lf, %lf) -> (%lf, %lf)\n", x, y, X, Y);
CRS_georef(X, Y, &x, &y, E12, N12, order);
printf("(%lf, %lf) -> (%lf, %lf)\n", X, Y, x, y);
}
}
 
C

Cameron Laird

.
.
.
I was generating the test points with a simple algorithm. I
finally discovered that the fitting algorithm fails if the
points are (close to being) colinear. .
.
.
I'll append the code that I successfully used.


It's good practice to make them interior (have ground control
points around the perimeter of the region of interest) but it
isn't always the case. The solution should be general enough
to work for outside points but it's understood that the error
for them is likely to be high.


It's something I meant to do during my Remote Sensing class
last semester and then someone asked about it in another news
group recently. It's not urgent. I just wanted to solve it.

(I have some aerial photos that I georeferenced using ERDAS
Imagine but I want to use my own code so that I can do more
with them.)

I'm happy with the C code, but I'd still prefer a Python
solution.
.
[few hundred lines of C]
.
.
Right.

The C source exhibited has no geodetic or cartographic content,
apart from a slight assymmetry in the model polynomial; it's
just a simple-minded least-squares regression against a few
leading terms (of what I hope are zero-based data--do you in
fact precondition your data?). It looks to me like a very
straightforward exercise to translate this mechanically to Python.

One ought to be able to do better, though. I don't know the
scientific packages available to Python. You mentioned previously
that you found a couple of good univariate polynomial solvers, but
no multivariate one. The cheap alternative is to pick out a
least-squares solver--there surely are several, and they're easy
to write, if not--and feed it tabulations of the terms of data:
1, x, y, (x ** 2) * y, x * (y ** 2), ... The biggest hazard is
the temptation to use too many terms.

This makes for a robust solver, is far less expensive than the
analytic work required for a formally-correct-but-practically-
indistinguishable solution, invariably, in my experience, yields
useful and/or instructive results, and, nicest of all, generalizes
nicely to appropriate transformations of the data. If you've
designed the solver correctly, it's easy to experiment with
logarithms or square roots or such. I think that's a healthy thing.

It's probably also timely to read up on R <URL:
http://www-106.ibm.com/developerworks/linux/library/l-sc16.html >.
There have been at least a couple of Python-R bindings in the last
four years. I haven't checked at all to assess their current
health.

And, oh YES, (near) colinearity indeed makes things go kablooey.
 
N

Natsu Mizutani

Hi,

How about using `scipy.optimize` at http://www.scipy.org .

From scipy_tutorial.pdf, it says
>A collection of general-purpose optimization routines.
> fmin -- Nelder-Mead Simplex algorithm
> (uses only function calls)
> fmin_powell -- Powell’s (modified) level set method (uses onlyfunction
> calls)
> fmin_bfgs -- Quasi-Newton method (can use function and gradient)
> fmin_ncg -- Line-search Newton Conjugate Gradient (can usefunction,
> gradient and hessian).
> leastsq -- Minimize the sum of squares of M equations in N unknowns
> given a starting estimate.

Note that if you want Win binary, you'll find them under
http://www.scipy.org/site_content/downloads
where you cannot reach by clicking links, I suppose.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top