Hypergeometric distribution

R

Raven

Cameron said:
This thread confuses me.

I've lost track of the real goal. If it's an exact calculation of
binomial coefficients--or even one of several other potential
targets mentioned--I echo Steven D'Aprano, and ask, are you *sure*
the suggestions already offered aren't adequate?

Hi Cameron, my real goal was to calculate the hypergeometric
distribution. The problem was that the function for hypergeometric
calculation from scipy uses the scipy.comb function which by default
uses floats so for large numbers comb(n,r) returns inf. and hence the
hypergeometric returns nan.
The first suggestion, the one by Robert Kern, resolved my problem:
Thanks to all of you guys, I could resolve my problem using the
logarithms as proposed by Robert.

Then the other guys gave alternative solutions so I tried them out. So
form me the suggestions offered are more than adequate :)

Cameron said:
Also, I think you
might not realize how accurate Stirling's approximation (perhaps to
second order) is in the range of interest.

The problem with Stirling's approximation is that I need to calculate
the hypergeometric hence the factorial for numbers within a large range
e.g. choose(14000,170) or choose(5,2)

Ale
 
P

Paul Rubin

Raven said:
The problem with Stirling's approximation is that I need to calculate
the hypergeometric hence the factorial for numbers within a large range
e.g. choose(14000,170) or choose(5,2)

Stirling's approximation to second order is fairly accurate even at
low values:

from math import log,exp,pi

def stirling(n):
# approx log(n!)
return n*(log(n)-1) + .5*(log(2.*pi*n)) + 1/(12.*n)

...
1 1.00227444918
2 2.00065204769
3 6.00059914247
4 24.0010238913
5 120.002637086
To third order it's even better:

from math import log,exp,pi

def stirling(n):
# approx log(n!)
return n*(log(n)-1) + .5*(log(2.*pi*n)) + 1/(12.*n) - 1/(360.*n*n*n)

...
1 0.999494216712
2 1.99995749743
3 5.99998182863
4 23.9999822028
5 119.999970391
Reference: http://en.wikipedia.org/wiki/Stirling's_approximation
 
B

Bengt Richter

Hi Cameron, my real goal was to calculate the hypergeometric
distribution. The problem was that the function for hypergeometric
ISTM that can't have been your "real goal" -- unless you are e.g. preparing numeric
tables for publication. IOW, IWT you probably intend to USE the hypergeometric
distribution values in some useful way to go towards your "real goal." ;-)

The requirements of this USE are still not apparent to me in your posts, though
that may be because I've missed something.
calculation from scipy uses the scipy.comb function which by default
uses floats so for large numbers comb(n,r) returns inf. and hence the
hypergeometric returns nan.
The first suggestion, the one by Robert Kern, resolved my problem:


Then the other guys gave alternative solutions so I tried them out. So
form me the suggestions offered are more than adequate :)



The problem with Stirling's approximation is that I need to calculate
the hypergeometric hence the factorial for numbers within a large range
e.g. choose(14000,170) or choose(5,2)
It seems you are hinting at some accuracy requirements that you haven't
yet explained. I'm curious how you use the values, and how that affects your
judgement of Stirling's approximation. In fact, perhaps the semantics of your
value usage could even suggest an alternate algorithmic approach to your actual end result.

Regards,
Bengt Richter
 
R

Robert Kern

Bengt said:
It seems you are hinting at some accuracy requirements that you haven't
yet explained. I'm curious how you use the values, and how that affects your
judgement of Stirling's approximation. In fact, perhaps the semantics of your
value usage could even suggest an alternate algorithmic approach to your actual end result.

Does it matter? Implementing Stirling's approximation is pointless when
scipy.special.gammaln() or scipy.special.gamma() does it for him.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
B

Bengt Richter

Does it matter? Implementing Stirling's approximation is pointless when
scipy.special.gammaln() or scipy.special.gamma() does it for him.
Who's talking about implementing Stirling's approximation? ;-) I'm trying to determine first
why the OP is thinking there's a problem with using it at all. With "alternate algorithmic
approach" I didn't mean an alternate way of calculating Stirling's approximation. I meant
to allude to the possibility that pulling a little further on the requirements thread might
even unravel some of the rationale for calculating the hypergeometric per se, depending on
how he's actually using it and why. Same old, same old: requirements, requirements ;-)

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top