linregress and polyfit

K

Krishnan

I created an xy pair

y = slope*x + intercept

then I added some noise to "y" using

numpy.random.normal - call it z

I could recover the slope, intercept from (x,y) using linregress
BUT cannot calculate the slope, intercept from (x, z)

What is puzzling is that for both pairs (x,y) and (x,z) the
polyfit (order 1) works just fine (gives me the slope, intercept)
----------------------------------------------------------------------
import numpy as np
import scipy
# create a straight line, y= slope*x + intercept
x = np.linspace(0.0,10.0,21)
slope = 1.0
intercept = 3.0
y = []
for i in range(len(x)):
y.append(slope*x + intercept)
# now create a data file with noise
z= []
for i in range(len(x)):
z.append(y + 0.1*np.random.normal(0.0,1.0,1))
# now calculate the slope, intercept using linregress
from scipy import stats
# No error here this is OK, works for x, y
cslope, cintercept, r_value, p_value, std_err = stats.linregress(x,y)
print cslope, cintercept
# I get an error here
#ValueError: array dimensions must agree except for d_0
nslope, nintercept, nr_value, np_value, nstd_err = stats.linregress(x,z)
print nslope, nintercept
# BUT polyfit works fine, polynomial or order 1 with both data sets
ncoeffs = scipy.polyfit(x,z,1)
print ncoeffs
coeffs = scipy.polyfit(x,y,1)
print coeffs
 
J

Josef Pktd

I created an xy pair



y = slope*x + intercept



then I added some noise to "y" using



numpy.random.normal - call it z



I could recover the slope, intercept from (x,y) using linregress

BUT cannot calculate the slope, intercept from (x, z)



What is puzzling is that for both pairs (x,y) and (x,z) the

polyfit (order 1) works just fine (gives me the slope, intercept)

----------------------------------------------------------------------

import numpy as np

import scipy

# create a straight line, y= slope*x + intercept

x = np.linspace(0.0,10.0,21)

slope = 1.0

intercept = 3.0

y = []

for i in range(len(x)):

y.append(slope*x + intercept)

# now create a data file with noise

z= []

for i in range(len(x)):

z.append(y + 0.1*np.random.normal(0.0,1.0,1))


When z is converted to a numpy array then it has an extra dimension that linregress cannot handle, because np.random.normal(0.0,1.0, 1) returns an array and not a scalar.

much easier: use vectorized numpy instead of loop

z = y + 0.1*np.random.normal(0.0,1.0, len(y))

which is a one dimensional array and works with linregress

Josef
 
C

chitturk

Thanks - that helps ... but it is puzzling because

np.random.normal(0.0,1.0,1) returns exactly one
and when I checked the length of "z", I get 21 (as before) ...
 
D

Dave Angel

Thanks - that helps ... but it is puzzling because

np.random.normal(0.0,1.0,1) returns exactly one
and when I checked the length of "z", I get 21 (as before) ...

I don't use Numpy, so this is just a guess, plus reading one web page.

According to:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.normal.html

the 3rd argument to normal should be a tuple. So if you want a single
element, you should have made it (1,)

As for checking the 'length of "z"' did you just use the len() function?
That just tells you the first dimension. Have you tried simply printing
out "z" ?
 
C

chitturk

tried (1,) - still same error ...
printed "z" and looks right, len(z) OK
(puzzling)
 
O

Oscar Benjamin

I don't use Numpy, so this is just a guess, plus reading one web page.

According to:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.normal.html

the 3rd argument to normal should be a tuple. So if you want a single
element, you should have made it (1,)

Numpy accepts ints in place of shape tuples so this makes no difference.
As for checking the 'length of "z"' did you just use the len() function?
That just tells you the first dimension. Have you tried simply printing
out "z" ?

Exactly. What you need to check is the shape attribute (converting to
numpy array first if necessary):
import numpy as np
a = np.random.normal(0, 1, 1)
a array([-0.90292348])
a.shape (1,)
a[0] -0.90292348393433797
np.array(a[0]) array(-0.902923483934338)
np.array(a[0]).shape ()
[a, a] [array([-0.90292348]), array([-0.90292348])]
np.array([a, a])
array([[-0.90292348],
[-0.90292348]])
np.array([a, a]).shape (2, 1)
np.random.normal(0, 1, 2).shape
(2,)

The square brackets in 'array([-0.90292348])' indicate that numpy
considers this to be a 1-dimensional array of length 1 rather than a
scalar value (which would have an empty shape tuple).


Oscar
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top