combination function in python

D

DanielJohnson

how to use the combination function in python ?

For example 9 choose 2 (written as 9C2) = 9!/7!*2!=36

Please help, I couldnt find the function through help.
 
A

Alex Martelli

DanielJohnson said:
how to use the combination function in python ?

For example 9 choose 2 (written as 9C2) = 9!/7!*2!=36

Please help, I couldnt find the function through help.

If you download and install gmpy, it's easy:
mpz(36)

However, there's no equivalent function built into Python, if that's
what you're asking (gmpy's a third-party extension). It's of course
easy to write one for yourself, if you want the functionality but don't
want to download gmpy and don't care for gmpy's superb speed.


Alex
 
B

bearophileHUGS

DanielJohnson:
Please help, I couldnt find the function through help.

You can't find it because it's not there:

def factorial(n):
"""factorial(n): return the factorial of the integer n.
factorial(0) = 1
factorial(n) with n<0 is -factorial(abs(n))
"""
result = 1
for i in xrange(1, abs(n)+1):
result *= i
if n >= 0:
return result
else:
return -result

def binomial(n, k):
"""binomial(n, k): return the binomial coefficient (n k)."""
assert n>0 and isinstance(n, (int, long)) and isinstance(k, (int,
long))
if k < 0 or k > n:
return 0
if k == 0 or k == n:
return 1
return factorial(n) // (factorial(k) * factorial(n-k))

Bye,
bearophile
 
S

Steven D'Aprano

DanielJohnson:

You can't find it because it's not there:

def factorial(n):
"""factorial(n): return the factorial of the integer n.
factorial(0) = 1
factorial(n) with n<0 is -factorial(abs(n))
"""
result = 1
for i in xrange(1, abs(n)+1):
result *= i
if n >= 0:
return result
else:
return -result

def binomial(n, k):
"""binomial(n, k): return the binomial coefficient (n k)."""
assert n>0 and isinstance(n, (int, long)) and isinstance(k, (int,
long))
if k < 0 or k > n:
return 0
if k == 0 or k == n:
return 1
return factorial(n) // (factorial(k) * factorial(n-k))


That's a naive and slow implementation. For even quite small values of n
and k, you end up generating some seriously big long ints, and then have
to (slowly!) divide them.

A better implementation would be something like this:

def binomial(n, k):
if not 0 <= k <= n:
return 0
if k == 0 or k == n:
return 1
# calculate n!/k! as one product, avoiding factors that
# just get canceled
P = k+1
for i in xrange(k+2, n+1):
P *= i
# if you are paranoid:
# C, rem = divmod(P, factorial(n-k))
# assert rem == 0
# return C
return P//factorial(n-k)


There's probably even a really clever way to avoid that final division,
but I suspect that would cost more in time and memory than it would save.
 
B

bearophileHUGS

Steven D'Aprano:
That's a naive and slow implementation. For even quite small values
of n and k, you end up generating some seriously big long ints,
and then have to (slowly!) divide them.
A better implementation would be something like this:

You are right, thank you for the improvement (the advantage of the
older implementation is that it's naive, so it's a bit more probably
correct compared to more complex code I may write. For Python code I
often tend to write a naive version first, create many testcases,
slowly fixing all the corner cases (like factorial(-5)), and only
later find a faster/better implementation if I have some time to do it
or if I need it. If I need to do lot of binomials the gmpy by Alex
helps).

Bye,
bearophile
 
J

Jussi Piitulainen

Steven said:
bearophileHUGS wrote: ....

That's a naive and slow implementation. For even quite small values
of n and k, you end up generating some seriously big long ints, and
then have to (slowly!) divide them.

A better _definition_ of the binomial coefficient with upper index r
and lower index k is (r * (r - 1) * ...) / (k * (k - 1) * ...) with k
factors in both products. These are called falling factorial powers by
Graham, Knuth and Patashnik. Their notation is to write n^k and k^k
but with the exponent underlined; the latter is just k!, when k > 0. A
straightforward implementation below.
A better implementation would be something like this:

def binomial(n, k):
if not 0 <= k <= n:
return 0
if k == 0 or k == n:
return 1
# calculate n!/k! as one product, avoiding factors that
# just get canceled
P = k+1
for i in xrange(k+2, n+1):
P *= i
# if you are paranoid:
# C, rem = divmod(P, factorial(n-k))
# assert rem == 0
# return C
return P//factorial(n-k)

There's probably even a really clever way to avoid that final
division, but I suspect that would cost more in time and memory than
it would save.

Here's one non-clever one for integers n, k that uses n^k / k^k
(falling powers) with the smaller of k and n - k as lower index:

def choose(n, k):
if 0 <= k <= n:
ntok = 1
ktok = 1
for t in xrange(1, min(k, n - k) + 1):
ntok *= n
ktok *= t
n -= 1
return ntok // ktok
else:
return 0
 
M

Mark Dickinson

def choose(n, k):
if 0 <= k <= n:
ntok = 1
ktok = 1
for t in xrange(1, min(k, n - k) + 1):
ntok *= n
ktok *= t
n -= 1
return ntok // ktok
else:
return 0

It might be even better to do the divisions as you go, rather than
leaving
them all to the end. That way the intermediate results stay smaller.
So
(leaving out the bounds checking) one just does:

def choose(n, k):
ntok = 1
for t in xrange(min(k, n-k)):
ntok = ntok*(n-t)//(t+1)
return ntok

Mark
 
A

Anton Vredegoor

We're getting closer and closer to something I already posted a few
times here. This implementation was unfortunate because I consistently
used an uncommon name for it so people couldn't easily find it (mea
culpa), and maybe also because it uses the despised reduce builtin.

def noverk(n,k):
return reduce(lambda a,b: a*(n-b)/(b+1),xrange(k),1)

BTW I hereby give up the strange name for this and request a better name
that everybody will use so there will be no confusion anymore. Maybe
comb(n,k) ?
Here's one non-clever one for integers n, k that uses n^k / k^k
(falling powers) with the smaller of k and n - k as lower index:

def choose(n, k):
if 0 <= k <= n:
ntok = 1
ktok = 1
for t in xrange(1, min(k, n - k) + 1):
ntok *= n
ktok *= t
n -= 1
return ntok // ktok
else:
return 0


Ha, my function uses smaller subproducts :)

A.
 
J

Jussi Piitulainen

Mark said:
It might be even better to do the divisions as you go, rather than
leaving them all to the end. That way the intermediate results
stay smaller. So (leaving out the bounds checking) one just does:

def choose(n, k):
ntok = 1
for t in xrange(min(k, n-k)):
ntok = ntok*(n-t)//(t+1)
return ntok

Yes, that's what I once saw done. Thanks. Its correctness is more
subtle, so I prefer to add the parentheses below to emphasise that
the product has to be computed before the division. I also renamed
the variable to p: it's no longer n^k (falling). And I put the
bounds back in.

def choose(n, k):
if 0 <= k <= n:
p = 1
for t in xrange(min(k, n - k)):
p = (p * (n - t)) // (t + 1)
return p
else:
return 0
 
B

bearophileHUGS

Mark Dickinson:
def choose(n, k):
ntok = 1
for t in xrange(min(k, n-k)):
ntok = ntok*(n-t)//(t+1)
return ntok

With few tests, it seems this is faster than the version by Jussi only
with quite big n,k.

(Another test to be added, is the assert n>0 of my original version
(testing that n,k seem useful too)).

Bye,
bearophile
 
M

mensanator

We're getting closer and closer to something I already posted a few
times here. This implementation was unfortunate because I consistently
used an uncommon name for it so people couldn't easily find it

But then, who's looking for it?
(mea culpa), and maybe also because it uses the despised reduce builtin.

def noverk(n,k):
     return reduce(lambda a,b: a*(n-b)/(b+1),xrange(k),1)

BTW I hereby give up the strange name for this and request a better name
that everybody will use so there will be no confusion anymore. Maybe
comb(n,k) ?

No, that name's already used by gmpy. And a single
function doesn't replace all of gmpy's other
functionality, many of which are needed in the
same applications where comb() is used, so there's
really no need for your function.

Your time is better spent applying the tools provided
by gmpy rather than trying to re-invent the wheel.
 
M

Mark Dickinson

With few tests, it seems this is faster than the version by Jussi only
with quite big n,k.

True---and for large n and k it's difficult to imagine any real-world
applications that wouldn't be better served by using the lngamma
function instead. That is, the natural log of n choose k can be
computed much more quickly as

lngamma(n+1) - lngamma(k+1) - lngamma(n-k+1)

I assume that lngamma is implemented somewhere in either numpy or
scipy, but right now I can't find it...

Mark
 
A

Anton Vredegoor

But then, who's looking for it?

The OP was trying to find it in the docs, assuming it was some kind of
builtin function. I can see the logic of that and I can also see the
logic of including some other smallish functions like for example fac.
Failing that, the next recourse would be the Internet, or a Usenet
search but that would imply well named Usenet posts and function names.

This is a rather concise function which has the added advantage that it
returns 0 when k>n.
No, that name's already used by gmpy. And a single
function doesn't replace all of gmpy's other
functionality, many of which are needed in the
same applications where comb() is used, so there's
really no need for your function.

Actually, by proposing comb I am *honoring* gmpy and I'm also complying
with it. In Python we use namespaces to differentiate between such
things. You might want to read the documentation about it some time.
Your time is better spent applying the tools provided
by gmpy rather than trying to re-invent the wheel.

Please let me be the judge of how to spend my time. In this case it
seems rather unjustified to introduce dependencies on external modules
when only a few functions are needed. Since I'm also interested in the
functions themselves -in this case- I'd rather have a few lines of code
in my source than importing an optimized code library. There *are*
situations where such things are completely justified, but I don't think
this is one of them. You could take it up with the gmpy author and
induce him to get gmpy included in the standard distro if you are so
inclined.

A.
 
S

Steven D'Aprano

... for large n and k it's difficult to imagine any real-world
applications that wouldn't be better served by using the lngamma
function instead. That is, the natural log of n choose k can be
computed much more quickly as

lngamma(n+1) - lngamma(k+1) - lngamma(n-k+1)

Yes, but if you actually want the number of combinations, not the log of
the number of combinations, you then have to raise e to that power to get
the answer you want -- and that takes time. Is it still faster? How about
round-off error due to floating point? Should you be truncating or
rounding? What about overflow error for large enough n, k?

What exactly are we doing with a five hundred digit long integer anyway?

Exact (long) integer maths still is useful.

I assume that lngamma is implemented somewhere in either numpy or
scipy, but right now I can't find it...

You know what they say about what happens when you ASS_U_ME :)
 
M

mensanator

The OP was trying to find it in the docs, assuming it was some kind of
builtin function. I can see the logic of that and I can also see the
logic of including some other smallish functions like for example fac.
Failing that, the next recourse would be the Internet, or a Usenet
search but that would imply well named Usenet posts and function names.

Isn't that what docstrings are for? Can't you leave
the function name noverk() and add something to the
effect of "this function calculates combinations"?
Then it would show up in searches, wouldn't it?
This is a rather concise function which has the added advantage that it
returns 0 when k>n.

import gmpy
print gmpy.comb(4,8)

## 0
Actually, by proposing comb I am *honoring* gmpy and
I'm also complying with it.

Perhaps you should use the name

comb_works_just_like_the_gmpy_version_only_slower()
In Python we use namespaces to differentiate between such
things.

Provided they get used.
You might want to read the documentation about it some time.

But *I* always use namespaces, as *I've* read the documentation.
But when there *is* no documentation, that becomes a problem,
doesn't it?
Please let me be the judge of how to spend my time.

Suit yourself.
In this case it seems rather unjustified to introduce
dependencies on external modules when only a few functions
are needed.

Unless you don't know how to write the functions you need
in which case you're better off relying on external
modules. Have you ever wondered why Python even *has*
a standard library? Why doesn't everyone just write
the functionality they need?
Since I'm also interested in the
functions themselves -in this case- I'd rather have a
few lines of code in my source than importing an
optimized code library.

Well, some people prefer optimized libraries because
they have real work to do, not just acedemic excercizes.
There *are*
situations where such things are completely justified,
but I don't think this is one of them.

So it doesn't matter whether anyone can find noverk(),
does it?
You could take it up with the gmpy author and
induce him to get gmpy included in the standard distro if you are so
inclined.

Alex Martelli knows more about that subject than I and
it would be pointless for me to bug him about it.
 
A

Alex Martelli

Alex Martelli knows more about that subject than I and
it would be pointless for me to bug him about it.

gmpy is LGPL (because so is the underlying GMP) and therefore very
unlikely to be included in the Python distribution, whose core
committers, collectively, choose laxer, looser licensing labels.

Of course, there being no "secret algorithm" for the computation of
binary coefficients in GMP, reimplementing comb "from scratch"
(_without_ peeking at GMP's sources, so it can get a normal Python
license) as a Python extension with similar performance to gmpy.comb is
a SMOP (for which I'm not the best candidate, since I may in the past
have peeked at GMP's sources, and although I don't consciously recall
anything about the binary-coefficients computation [except that there
was nothing particularly exciting about it:)] we surely don't want to
risk the FSF suing the PSF, now do we...).


Alex
 
A

Anton Vredegoor

Isn't that what docstrings are for? Can't you leave
the function name noverk() and add something to the
effect of "this function calculates combinations"?
Then it would show up in searches, wouldn't it?

Yes, a doc string would help finding it in searches, however since this
thread now contains a collection of function names it will suffice.

There is also the other consideration of being able to easily read code
that others write. Functions that try to do the same thing having the
same name would help. If we had some module in the standard distro that
would contain comb and fac and maybe a few other functions people could
start using the same name to point to the same thing etc.
import gmpy
print gmpy.comb(4,8)

## 0

There is more to it than that. If I see the way the function computes
the values it gives me hints about how I could write a function that
lists all combinations.

For example the fact that one can divide the product 'on the fly' -I
mean without computing the totals above and below the division bar-
tells me something about how to tackle the problem of generating the
corresponding combinatorial structure. I *love* combinatorial
structures. Little functions like this have more than once helped me to
understand them better.

Math is not so much my way of describing things, I like executable
pseudo code better. Sometimes I had to read math formulas and have
become somewhat used to that notation -it has its uses too- but if
things are written down in code I can rely on the *computer* to execute
the code instead of my brain which unfortunately is not as reliable.

These advantages are lost when one just imports some optimized code library.
Perhaps you should use the name

comb_works_just_like_the_gmpy_version_only_slower()

Or maybe 'comb_in_executable_pseudocode', but maybe some other
implementations posted in this thread would be better suited for that.

Anyway, I have the impression you just don't get the point of people
posting various functions so that one can determine which algorithms
work best or so that one can include each others ideas and create better
functions that combine the best elements of all posted code. Cutting and
pasting Python functions works a lot better in Usenet than importing C
libraries.

If no one posts code -even if it's not completely working code- how are
we to learn from each other? The fact that someone else somewhere
already saw the light and wrote the perfect function in optimized C or
Fortran code should not stop us from 'reinventing the wheel' -as you
call it- because one can get more proficient in wheel making in the
process and every now and then it even leads to better wheels.

Don't even get me started about this archaic scheme of writing code only
for 'work' instead of for gaining insight that you seem to promote.
But when there *is* no documentation, that becomes a problem,
doesn't it?

This thread is also documentation and I hope it will encourage people to
post more code and share thoughts.
Unless you don't know how to write the functions you need
in which case you're better off relying on external
modules. Have you ever wondered why Python even *has*
a standard library? Why doesn't everyone just write
the functionality they need?

The main virtue of Pythons standard library should be that it shows one
how to do things and that it introduces a common naming scheme. That was
what got me interested in python in the first place: Now I got to see
how things work! No more dark secrets and proprietary code!

Sometimes it's better to sacrifice clarity for speed and write optimized
lower level code but that doesn't mean there aren't large trade offs
involved.

In the future if computers will become faster it would be wise to drop
the optimized code libraries again and reintroduce the Python versions
of the same because by then the Python versions would be fast enough and
it would also result in a smaller code base (but a more functional and
flexible and readable one).

What are you doing on this planet anyway if it's not trying to
understand things?
Well, some people prefer optimized libraries because
they have real work to do, not just acedemic excercizes.

Real work is just an excuse for having a larger slice of the cake than
other people. Other people can be just as or even more essential in the
whole process of developing code than those who get paid.

A reason against using third party libraries in general is not wanting
to include too many external dependencies that would force the user to
download all kinds of things from different places which could change
their urls or web interfaces. It would also introduce more points where
version conflicts or license conflicts could occur.
So it doesn't matter whether anyone can find noverk(),
does it?

It does matter a lot in the long run if people can compare their code to
the code other people write. If still other people profit from these
public exchanges and then adopt a condescending attitude towards the
grass roots processes -either because of misguided adoration for the
best programmers or because they think themselves to be the best or
maybe because of some archaic work related value system- that would
dampen the enthusiasm a bit.
Alex Martelli knows more about that subject than I and
it would be pointless for me to bug him about it.

Perhaps even your kind of misguided criticism would be better than such
a fatalistic attitude.

A.
 
M

mensanator

Yes, a doc string would help finding it in searches, however since this
thread now contains a collection of function names it will suffice.

There is also the other consideration of being able to easily read code
that others write. Functions that try to do the same thing having the
same name would help.

Why not simply the same doc string? You can't expect everyone to
use the same function name.
If we had some module in the standard distro that
would contain comb and fac and maybe a few other functions people could
start using the same name to point to the same thing etc.

You COULD add those to the standard distro. You COULD read the
newsgroups and get various Pythonic solutions. But you don't
HAVE to if you're looking just for a solution.
There is more to it than that. If I see the way the function computes
the values it gives me hints about how I could write a function that
lists all combinations.

But you said returning 0 was an advantage, as if that's not
typical. I was merely pointing out that it is NOT an advantage
over gmpy. Giving you hints about further application is
a different advantage than what you listed.
For example the fact that one can divide the product 'on the fly' -I
mean without computing the totals above and below the division bar-
tells me something about how to tackle the problem of generating the
corresponding combinatorial structure. I *love* combinatorial
structures. Little functions like this have more than once helped me to
understand them better.

Fine, and gmpy doesn't give you the actual combinations, so I wrote
my own. But if there's a faster, better 3rd party library that
can do the same things mine does (permutations with replacement,
combinations with replacement, permutations without replacement and
combinations without replacement) then I'll drop mine like a live
grenade.

I wrote this because I needed to use it in a bigger application,
the details were not important, just the output. It's the bigger
applications details I'm concerned about.
Math is not so much my way of describing things, I like executable
pseudo code better. Sometimes I had to read math formulas and have
become somewhat used to that notation -it has its uses too- but if
things are written down in code I can rely on the *computer* to execute
the code instead of my brain which unfortunately is not as reliable.

These advantages are lost when one just imports some optimized code library.

And once I looked up the Extended Euclidean Algorithm, made a Python
implementaion of it and after I got it all working, I found it was
all unnecessary because there was a gmpy function already present
(which I had previously ignored because it didn't understand what
it meant) that did exactly what I want. Yes, I'm glad I went through
the trouble of learning the algorithm, but I simply threw away my
version.

You talk about losing the advantage of learning how to do this,
but is there any difference between your code and gmpy as far as
the OP is concerned? Is he going to actually learn anything from
your program or is he just going to use it? Is it going to matter
to the OP whether he uses a 3rd party module or some code (that he
doesn't understand) that he found on Usenet?
Or maybe 'comb_in_executable_pseudocode', but maybe some other
implementations posted in this thread would be better suited for that.

Anyway, I have the impression you just don't get the point of people
posting various functions so that one can determine which algorithms
work best or so that one can include each others ideas and create better
functions that combine the best elements of all posted code. Cutting and
pasting Python functions works a lot better in Usenet than importing C
libraries.

"Works better" for learning the algorithms, doesn't "work better"
performance wise.
If no one posts code -even if it's not completely working code- how are
we to learn from each other?

By focusing on things that aren't available in 3rd party modules.
The fact that someone else somewhere
already saw the light and wrote the perfect function in optimized C or
Fortran code should not stop us from 'reinventing the wheel' -as you
call it- because one can get more proficient in wheel making in the
process and every now and then it even leads to better wheels.

But there's no reason why anyone should use your re-invented wheel
in place of the old wheel.
Don't even get me started about this archaic scheme of writing code only
for 'work' instead of for gaining insight that you seem to promote.

I didn't mean 'work' in any kind of business sense (all my Python
code is strictly amateur mathematics). I meant when you have a task
to solve, learning how the tools are made is not important, it's
USING the tools. Sure, it's possible to have gained some insight
into using the tools by knowing how they're made, but as I said with
the EEA, you can keep the insight and throw the program away if
there's
a better alternative in a 3rd party module.
This thread is also documentation and I hope it will encourage people to
post more code and share thoughts.


The main virtue of Pythons standard library should be that it shows one
how to do things and that it introduces a common naming scheme. That was
what got me interested in python in the first place: Now I got to see
how things work! No more dark secrets and proprietary code!

I think you would get an argument that that's its MAIN virtue.
Sometimes it's better to sacrifice clarity for speed and write optimized
lower level code but that doesn't mean there aren't large trade offs
involved.

And sometimes it's not. Sometimes high-level clarity is better than
low-level clarity. I have no concern about HOW the modular inverse
function works, what I care about is how to use it to solve a linear
congruence.
In the future if computers will become faster it would be wise to drop
the optimized code libraries again and reintroduce the Python versions
of the same because by then the Python versions would be fast enough and
it would also result in a smaller code base (but a more functional and
flexible and readable one).

You're wrong here. Python and the computers it runs on will NEVER be
fast enough, even if Python was compiled and executing native code.
When things get faster, the problems get bigger. All the improvement
does is push back the threshhold of intractability, it can never make
it go away.
What are you doing on this planet anyway if it's not trying to
understand things?

I'm trying to understand what *I* want to understand and ignore
what I don't NEED to understand.
Real work is just an excuse for having a larger slice of the cake than
other people. Other people can be just as or even more essential in the
whole process of developing code than those who get paid.

As I said earlier, nothing to do with getting paid. The problems
I work on are bigger than algorithms for calculating combination
counts, so my focus is on them and if I can employ a 3rd party
module, then that's more effort I can devote to the problem.
A reason against using third party libraries in general is not wanting
to include too many external dependencies that would force the user to
download all kinds of things from different places which could change
their urls or web interfaces. It would also introduce more points where
version conflicts or license conflicts could occur.

Strange, you think posting code on Usenet to share with others is
good,
yet you disparage those who formalize that sharing as a module.
It does matter a lot in the long run if people can compare their code to
the code other people write. If still other people profit from these
public exchanges and then adopt a condescending attitude towards the
grass roots processes -either because of misguided adoration for the
best programmers or because they think themselves to be the best or
maybe because of some archaic work related value system- that would
dampen the enthusiasm a bit.

I think you're wrong here. I didn't say people shouldn't be posting
these solutions, I said anyone who knows better won't be looking for
them.
Trust me, I was MUCH more enthusiastic about solving my linear
congruence
problem (which I had been toying with for about a year - calendar
time)
than I was over knowing how to do an Extended Euclidean Algorithm.
Perhaps even your kind of misguided criticism would be better than such
a fatalistic attitude.

Fatalistic? I knew there were reasons, I just didn't care to make
any effort to remember them and saw no reason to look them up.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,998
Latest member
MarissaEub

Latest Threads

Top