bias in random.normalvariate??

D

drewlist

I'm a Python newbie and certainly no expert on statistics, but my wife
was taking a statistics course this summer and to illustrate that
sampling random numbers from a distribution and taking an average of
the samples gives you a random number as the result (bigger sample ->
smaller variance in the calculated random number, converging in on the
mean of the original distribution), I threw together this program:

#! /usr/bin/python

import random;

i=1
samplen=100
mean=130
lo=mean
hi=mean
sd=10
sum=0
while(i<=samplen):
x=random.normalvariate(mean,sd)
#print x
if x<lo: lo=x
if x>hi: high=x
sum+=x
i+=1
print 'sample mean=', sum/samplen, '\n'
print 'low value =', lo
print 'high value=', high
---------------------------------------------------------
But the more I run the darn thing, the stranger the results look to
me.
random.normalvariate is defined on page 89 of

http://www-acc.kek.jp/WWW-ACC-exp/KEKB/Control/Python Documents/lib.pdf

as generating points from a normal distribution with mean and standard
deviation given by the arguments. But my test program consistently
comes up with sample means that are less than the mean of the
distribution. The lo value is consistently much lower relative to
the mean than the high value is higher than the mean. That is, it
looks to me like the normalvariate function is biased.

Part of my being a Python newbie is I'm not really sure where to go to
discuss this problem. If this group isn't the right place, do feel
free to point me to where I ought to go.

I'm running Ubuntu Dapper and "python -V" says I've got Python
2.4.3. I tried looking in random.py down under /usr/lib but find no
clues there as to the version of the random module on my machine. Am
I missing something?

/usr/lib/python2.4$ ls -l random.py
-rw-r--r-- 1 root root 30508 2006-10-06 04:34 random.py

I added the lo and high stuff to my test program out of fear that I
was running into something funky in adding up 100 floating point
numbers. That would be more of a worry if the sample size was much
bigger, but lo and high showed apparent bias quite aside from the
calculation of the mean.

Am I committing some other obvious statistical or Python blunder?
e.g. Am I mis-understanding what random.normalvariate is supposed to
do?
 
D

Dan Bishop

I'm a Python newbie and certainly no expert on statistics, but my wife
was taking a statistics course this summer and to illustrate that
sampling random numbers from a distribution and taking an average of
the samples gives you a random number as the result (bigger sample ->
smaller variance in the calculated random number, converging in on the
mean of the original distribution), I threw together this program:
....

I added the lo and high stuff to my test program out of fear that I
was running into something funky in adding up 100 floating point
numbers. That would be more of a worry if the sample size was much
bigger, but lo and high showed apparent bias quite aside from the
calculation of the mean.

Am I committing some other obvious statistical or Python blunder?
e.g. Am I mis-understanding what random.normalvariate is supposed to
do?

Doing some testing with mu=0, sigma=1, and n=1000000 gives me means of

-0.00096407536711885962
-0.0015179019121429708
+6.9223244807378563e-05
+0.0017483897464631625
-0.0011148444018505548
+0.0015367250480148183

There appears to be no consistent bias.
 
R

Robert Kern

I'm a Python newbie and certainly no expert on statistics, but my wife
was taking a statistics course this summer and to illustrate that
sampling random numbers from a distribution and taking an average of
the samples gives you a random number as the result (bigger sample ->
smaller variance in the calculated random number, converging in on the
mean of the original distribution), I threw together this program:

#! /usr/bin/python

import random;

i=1
samplen=100
mean=130
lo=mean
hi=mean
sd=10
sum=0
while(i<=samplen):
x=random.normalvariate(mean,sd)
#print x
if x<lo: lo=x
if x>hi: high=x
sum+=x
i+=1
print 'sample mean=', sum/samplen, '\n'
print 'low value =', lo
print 'high value=', high

Your code has an error. In the middle of your code, you changed "hi" to "high".

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
S

Steve Holden

Robert said:
Your code has an error. In the middle of your code, you changed "hi" to "high".
Which very nicely makes the point that you can test algorithms driven by
random data using statistical functions on the output!

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top