Hypergeometric distribution

Raven · Jan 4, 2006

Cameron said:
This thread confuses me.

I've lost track of the real goal. If it's an exact calculation of
binomial coefficients--or even one of several other potential
targets mentioned--I echo Steven D'Aprano, and ask, are you *sure*
the suggestions already offered aren't adequate?

Hi Cameron, my real goal was to calculate the hypergeometric
distribution. The problem was that the function for hypergeometric
calculation from scipy uses the scipy.comb function which by default
uses floats so for large numbers comb(n,r) returns inf. and hence the
hypergeometric returns nan.
The first suggestion, the one by Robert Kern, resolved my problem:

Thanks to all of you guys, I could resolve my problem using the
logarithms as proposed by Robert.

Then the other guys gave alternative solutions so I tried them out. So
form me the suggestions offered are more than adequate

Cameron said:
Also, I think you
might not realize how accurate Stirling's approximation (perhaps to
second order) is in the range of interest.

The problem with Stirling's approximation is that I need to calculate
the hypergeometric hence the factorial for numbers within a large range
e.g. choose(14000,170) or choose(5,2)

Ale

Paul Rubin · Jan 4, 2006

Raven said:
The problem with Stirling's approximation is that I need to calculate
the hypergeometric hence the factorial for numbers within a large range
e.g. choose(14000,170) or choose(5,2)

Stirling's approximation to second order is fairly accurate even at
low values:

from math import log,exp,pi

def stirling(n):
# approx log(n!)
return n*(log(n)-1) + .5*(log(2.*pi*n)) + 1/(12.*n)

...
1 1.00227444918
2 2.00065204769
3 6.00059914247
4 24.0010238913
5 120.002637086
To third order it's even better:

from math import log,exp,pi

def stirling(n):
# approx log(n!)
return n*(log(n)-1) + .5*(log(2.*pi*n)) + 1/(12.*n) - 1/(360.*n*n*n)

...
1 0.999494216712
2 1.99995749743
3 5.99998182863
4 23.9999822028
5 119.999970391
Reference: http://en.wikipedia.org/wiki/Stirling's_approximation

Bengt Richter · Jan 5, 2006

Hi Cameron, my real goal was to calculate the hypergeometric
distribution. The problem was that the function for hypergeometric

ISTM that can't have been your "real goal" -- unless you are e.g. preparing numeric
tables for publication. IOW, IWT you probably intend to USE the hypergeometric
distribution values in some useful way to go towards your "real goal." ;-)

The requirements of this USE are still not apparent to me in your posts, though
that may be because I've missed something.

calculation from scipy uses the scipy.comb function which by default
uses floats so for large numbers comb(n,r) returns inf. and hence the
hypergeometric returns nan.
The first suggestion, the one by Robert Kern, resolved my problem:

Then the other guys gave alternative solutions so I tried them out. So
form me the suggestions offered are more than adequate

The problem with Stirling's approximation is that I need to calculate
the hypergeometric hence the factorial for numbers within a large range
e.g. choose(14000,170) or choose(5,2)

It seems you are hinting at some accuracy requirements that you haven't
yet explained. I'm curious how you use the values, and how that affects your
judgement of Stirling's approximation. In fact, perhaps the semantics of your
value usage could even suggest an alternate algorithmic approach to your actual end result.

Regards,
Bengt Richter

Robert Kern · Jan 5, 2006

Bengt said:
It seems you are hinting at some accuracy requirements that you haven't
yet explained. I'm curious how you use the values, and how that affects your
judgement of Stirling's approximation. In fact, perhaps the semantics of your
value usage could even suggest an alternate algorithmic approach to your actual end result.

Does it matter? Implementing Stirling's approximation is pointless when
scipy.special.gammaln() or scipy.special.gamma() does it for him.

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter

Bengt Richter · Jan 5, 2006

Does it matter? Implementing Stirling's approximation is pointless when
scipy.special.gammaln() or scipy.special.gamma() does it for him.

Who's talking about implementing Stirling's approximation? ;-) I'm trying to determine first
why the OP is thinking there's a problem with using it at all. With "alternate algorithmic
approach" I didn't mean an alternate way of calculating Stirling's approximation. I meant
to allude to the possibility that pulling a little further on the requirements thread might
even unravel some of the rationale for calculating the hypergeometric per se, depending on
how he's actually using it and why. Same old, same old: requirements, requirements ;-)

Regards,
Bengt Richter

ISO module for binomial coefficients, etc.	7	Jan 23, 2010
MacOS 10.9.2: threading error using python.org 2.7.6 distribution	7	Apr 25, 2014
About a value error called 'ValueError: A value in x_new is below theinterpolation range'	0	Feb 6, 2013
Simple scipy question F-distribution	1	Nov 3, 2003
Array of pointer-to-functions	23	Sep 17, 2012
Problem with inheritance and arbitrary "features" support (viatemplates).	7	Nov 12, 2009
Algorithms as objects?	6	Aug 28, 2009
Eigensolver for Large Sparse Matrices in Python	1	Jun 8, 2011

Hypergeometric distribution

Raven

Paul Rubin

Bengt Richter

Robert Kern

Bengt Richter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads