number generator

A

Army1987

cesco said:
I have to generate a list of N random numbers (integer) whose sum is
equal to M. If, for example, I have to generate 5 random numbers whose
sum is 50 a possible solution could be [3, 11, 7, 22, 7]. Is there a
simple pattern or function in Python to accomplish that?

Thanks and regards
Francesco


You can initialize a list to [1, 1, 1, 1, 1], and generate 45 random
integers between 1 and 5, and every time a number is generated, increase the
Nth number in the list by one.

Not all distinct lists will have the same chance of occurring, e.g. [46, 1,
1, 1, 1] will be much less likely than [10, 10, 10, 10, 10]. Depending on
what you need these numbers for, it can be a good thing or a bad thing.

--Army1987
 
A

Alex Martelli

Army1987 said:
cesco said:
I have to generate a list of N random numbers (integer) whose sum is
equal to M. If, for example, I have to generate 5 random numbers whose
sum is 50 a possible solution could be [3, 11, 7, 22, 7]. Is there a
simple pattern or function in Python to accomplish that?

Thanks and regards
Francesco


You can initialize a list to [1, 1, 1, 1, 1], and generate 45 random
integers between 1 and 5, and every time a number is generated, increase the
Nth number in the list by one.

Not all distinct lists will have the same chance of occurring, e.g. [46, 1,
1, 1, 1] will be much less likely than [10, 10, 10, 10, 10]. Depending on
what you need these numbers for, it can be a good thing or a bad thing.

And a1-liner way to get the numbers (net of the mandatory +1 for each)
is:

map([random.randrange(5) for i in xrange(45)].count, xrange(5))

i.e., this gives 5 integers (each between 0 and 45 included) summing to
45 -- add 1 to each of them to get the desired result.

Without any specification regarding the distributions required for the
"5 random numbers" it's really impossible to say whether these are
better or worse than other proposed solutions.


Alex
 
D

Duncan Booth

Without any specification regarding the distributions required for the
"5 random numbers" it's really impossible to say whether these are
better or worse than other proposed solutions.

FWIW, I decided it would be fun to see what kind of implementation I
could come up test driven and avoiding reading the thread or background
references too much first. So starting from:

import unittest

def partition(total, n, min):
return [50]

class PartitionTests(unittest.TestCase):
def testsinglevalue(self):
self.assertEqual([50], partition(50, 1, 1))

if __name__=='__main__':
unittest.main()

I eventually worked my way through 15 revisions to the code below.

The tests were added in the order you see them below. The commented out
function is the one I had arrived at before I added the first
distribution test which triggered a major refactor (although the code
ended up remarkably similar): if you uncomment it all but the
distribution tests pass.

I don't really like the arbitrary limits for the distribution tests, but
I'm not sure how else to test that sort of thing. And as Alex said,
without knowing what distribution the OP wanted the definition I chose
to use is completely arbitrary.

----- partition.py -------
import unittest, collections
from random import randint, sample

def partition(total, n, min):
maxtotal = total - n*(min-1)
posts = sorted(sample(xrange(1, maxtotal), n-1))
return [ (b-a)+min-1 for (a,b) in zip([0]+posts, posts+[maxtotal]) ]

# def partition(total, n, min):
# maxtotal = total - (n*min)
# sums = sorted(randint(0, maxtotal) for i in range(n-1))
# return [(b-a)+min for (a,b) in zip([0]+sums, sums+[maxtotal])]

class PartitionTests(unittest.TestCase):
def testsinglevalue(self):
self.assertEqual([50], partition(50, 1, 1))

def testnvalues(self):
self.assertEqual([1]*5, partition(5, 5, 1))

def testnminusone(self):
self.assertEqual([1]*4+[2], sorted(partition(6, 5, 1)))

def testnminusone2(self):
# Check we get all possible results eventually
expected = set([(1,1,2), (1,2,1), (2,1,1)])
for i in range(100):
got = tuple(partition(4,3,1))
if got in expected:
expected.remove(got)
if not len(expected):
break
self.assertEqual(len(expected), 0)

def testdistribution(self):
# Make sure we get each of 3 possible outcomes roughly
# equally often.
actual = collections.defaultdict(int)
for i in range(1000):
actual[tuple(partition(4,3,1))] += 1
counts = actual.values()
assert (min(counts) > 250)
assert (max(counts) < 400)

def testdistribution2(self):
# More arbitrary limits for the distribution. 10 possible
# outcomes this time.
actual = collections.defaultdict(int)
ntries = 10000
for i in range(ntries):
actual[tuple(partition(6,3,1))] += 1
counts = actual.values()
assert (min(counts) > 900)
assert (max(counts) < 1100)

def testmintwo(self):
self.assertEqual([2]*50, partition(100,50,2))

def testminzero(self):
self.assertEqual([0]*20, partition(0,20,0))

def testcantdoit(self):
self.assertRaises(ValueError, partition, 100, 51, 2)

if __name__=='__main__':
unittest.main()

--------------------------
 
N

Nick Craig-Wood

Paul Rubin said:
The fencepost method still seems to be simplest:

t = sorted(random.sample(xrange(1,50), 4))
print [(j-i) for i,j in zip([0]+t, t+[50])]

Mmm, nice.

Here is another effort which is easier to reason about the
distribution produced but not as efficient.

def real(N, M):
while 1:
t = [ random.random() for i in range(N) ]
factor = M / sum(t)
t = [ int(round(x * factor)) for x in t]
if sum(t) == M:
break
print "again"
assert len(t) == N
assert sum(t) == M
return t

It goes round the while loop on average 0.5 times.

If 0 isn't required then just test for it and go around the loop again
if found. That of course skews the distribution in difficult to
calculate ways!
 
C

Carsten Haese

| Terry Reedy wrote:
|
| > Partitioning positive count m into n positive counts that sum to m is a
| > standard combinatorial problem at least 300 years old. The number of
such
| > partitions, P(m,n) has no known exact formula [...]
| [...] Steven pointed out that one can view the problem like this:
|
| 00010000100010100
|
| That would be [3,4,3,1,2]
|
| where the '1' elements are like dividing shutters that partition the row
| of '0'. This means that the problem is reduced to permutations (albeit

If any of the 1s appear at the ends or together, then you would have 0s in
the partition, which is not allowed, as I understood the spec.

Correct, the OP's spec doesn't allow 0s, but the problem can be easily
transformed back and forth between positive partitions and non-negative
partitions. In order to partition M into N positive numbers, partition
(M-N) into N non-negative numbers and increase each part by 1.

There must be some other constraint on what P(M,N) means, or I just
solved a 300 year old problem.

-Carsten
 
H

Hendrik van Rooyen

Nick Craig-Wood said:
Paul Rubin said:
The fencepost method still seems to be simplest:

t = sorted(random.sample(xrange(1,50), 4))
print [(j-i) for i,j in zip([0]+t, t+[50])]

Mmm, nice.

Here is another effort which is easier to reason about the
distribution produced but not as efficient.

def real(N, M):
while 1:
t = [ random.random() for i in range(N) ]
factor = M / sum(t)
t = [ int(round(x * factor)) for x in t]
if sum(t) == M:
break
print "again"
assert len(t) == N
assert sum(t) == M
return t

It goes round the while loop on average 0.5 times.

If 0 isn't required then just test for it and go around the loop again
if found. That of course skews the distribution in difficult to
calculate ways!

I have been wondering about the following as this thread unrolled:

Is it possible to devise a test that can distinguish between sets
of:

- five random numbers that add to 50, and
- four random numbers and a fudge number that add to 50?

My stats are way too small and rusty to attempt to answer
the question, but it seems intuitively a very difficult thing.

- Hendrik
 
D

Dick Moores

At said:
So why not just repeatedly call a function to generate lists of
length N of random integers within the appropriate range (the closed
interval [1,M-N-1]), and return the first list the sum of which is M?
I don't understand what all the discussion is about. Time is not one
of the constraints.

Time is always a constraint. The sun will expand and destroy the Earth in
a couple of billion years, it would be nice to have a solutions before
then...

*wink*

Seriously, almost all programming problems have two implicit constraints:
it must run as fast as practical, using as little computer resources (e.g.
memory) as practical. Naturally those two constraints are usually in
opposition, which leads to compromise algorithms that run "fast enough"
without using "too much" memory.

OK, points well-taken.

The problem posed by the OP is "Given two positive integers, N and M
with N < M, I have to generate N
positive integers such that sum(N)=M. No more constraints."

But let's say there is one more constraint--that for each n of the N
positive integers, there must be an equal chance for n to be any of
the integers between 1 and M-N+1, inclusive. Thus for M == 50 and N
== 5, the generated list of 5 should be as likely to be [1,46,1,1,1]
as [10,10,10,10,10] or [14, 2, 7, 1, 26].

Wouldn't sumRndInt() be THE solution?:
=================================================================
def sumRndInt(M, N):
import random
while True:
lst = []
for x in range(N):
n = random.randint(1,M-N+1)
lst.append(n)
if sum(lst) == M:
return lst

if __name__ == '__main__':

N = 5
M = 50

lst = sumRndInt(M, N)

print "N is %d, M is %d, lst is %s, sum(lst) is %d" % (N, M,
lst, sum(lst))
==============================================================

I hope I don't seem querulous--I really want to know.

Thanks,

Dick Moores
 
D

Duncan Booth

Dick Moores said:
But let's say there is one more constraint--that for each n of the N
positive integers, there must be an equal chance for n to be any of
the integers between 1 and M-N+1, inclusive. Thus for M == 50 and N
== 5, the generated list of 5 should be as likely to be [1,46,1,1,1]
as [10,10,10,10,10] or [14, 2, 7, 1, 26].

I don't think what you wrote actually works. Any combination including a 46
must also have four 1s, so the digit 1 has to be at least 4 times as likely
to appear as the digit 46, and probably a lot more likely than that.

On the other hand, making sure that each combination occurs equally often
(as your example might imply) is doable but still leaves the question
whether the order of the numbers matters: are [1,46,1,1,1] and [1,1,46,1,1]
the same or different combinations?

The sorted/sample/xrange solution seems to give an even distribution or
ordered combinations, I don't know what algorithm (short of enumerating all
possible solutions and choosing one at random) gives an even distribution
when the ordering doesn't matter.
 
P

Paul Rubin

Terry Reedy said:
P(m,n) has no known exact formula but can be computed
inductively rather easily. The partitions for m and n can be ranked in
lexicographic order from 0 to P(m,n)-1. Given a rank r in that range, one
can calculate the particular partition that has that rank. So a
equi-probable random count in the range can be turned into a equi-probable
random partition.

I don't see an efficient way to do that, but will try to look sometime
for the book you cited.

Here is a brute force generation approach (not fully tested since
I'm using Python 2.3):

from itertools import imap

# yield all partitions of n having length k, with many duplications
def _partitions(n,k):
if k==0: return
for i in xrange(1,n-k+1):
for p in partitions(n-i, k-1):
yield (i,)+p

def unique_partitions(n,k):
return sorted(set(tuple(sorted(p)) for p in _partitions(n,k)))

p50_5 = unique_partitions(50,5)
assert len(p50_5) = 2611

Note the use of a generator expression in unique_partitions to
enumerate the non-unique partitions without remembering them all in
memory. "set" strips out the duplicates and remembers only the unique
ones. The above takes a few seconds to run on (50,5) on my 1.2 ghz laptop.
 
G

Gabriel Genellina

En Tue, 13 Mar 2007 03:20:49 -0300, Hendrik van Rooyen
Is it possible to devise a test that can distinguish between sets
of:

- five random numbers that add to 50, and
- four random numbers and a fudge number that add to 50?

My stats are way too small and rusty to attempt to answer
the question, but it seems intuitively a very difficult thing.

You can't have two different sets with four equal numbers - it's not a
very difficult thing, it's impossible to distinguish because they're
identical!
Given 4 numbers in the set, the 5th is uniquely determined. By example:
12, 3, 10, 18 *must* end with 7. The 5th number is not "random". Any
randomness analysis should include only the first 4 numbers (or any other
set of 4).
In other words, there are only 4 degrees of freedom. In the fence analogy,
you only have to choose where to place 4 poles; the 5th is fixed at the
end.
 
G

greg

Gabriel said:
The 5th number is not "random".

More precisely, the fifth number is not *independent*
of the others. You can't have five independent random
numbers that sum to 50; only four independent numbers
plus a dependent one.
 
D

Duncan Smith

Hendrik said:
Paul Rubin said:
The fencepost method still seems to be simplest:

t = sorted(random.sample(xrange(1,50), 4))
print [(j-i) for i,j in zip([0]+t, t+[50])]

Mmm, nice.

Here is another effort which is easier to reason about the
distribution produced but not as efficient.

def real(N, M):
while 1:
t = [ random.random() for i in range(N) ]
factor = M / sum(t)
t = [ int(round(x * factor)) for x in t]
if sum(t) == M:
break
print "again"
assert len(t) == N
assert sum(t) == M
return t

It goes round the while loop on average 0.5 times.

If 0 isn't required then just test for it and go around the loop again
if found. That of course skews the distribution in difficult to
calculate ways!


I have been wondering about the following as this thread unrolled:

Is it possible to devise a test that can distinguish between sets
of:

- five random numbers that add to 50, and
- four random numbers and a fudge number that add to 50?

My stats are way too small and rusty to attempt to answer
the question, but it seems intuitively a very difficult thing.

- Hendrik

Yes, if the generating processes yield numbers from different
probability mass functions. You could simply look at the likelihood
ratio. Otherwise, the likelihood ratio will be 1.

Duncan
 
S

Steven D'Aprano

Is it possible to devise a test that can distinguish between sets
of:

- five random numbers that add to 50, and
- four random numbers and a fudge number that add to 50?

My stats are way too small and rusty to attempt to answer
the question, but it seems intuitively a very difficult thing.

Easy-peasey. Just collate the individual numbers from the lists, and graph
their frequencies. You will get quite different distributions. Both are
"random", but in different ways.

Here is some code to do it.

import random

def method1(total, count):
"""SLOW way of generating random lists."""
L = []
while sum(L) != total:
L = [random.randint(1, total) for i in range(count)]
assert sum(L) == total
assert len(L) == count
return L

def method2(total, count):
if total <= 0 or count <= 0:
return []
elif count == 1:
return [total]
else:
n = random.randint(1, total-count)
L = [n] + method2(total-n, count-1)
assert sum(L) == total
assert len(L) == count
return L


def collate_numbers(method, count):
bins = {}
for i in xrange(count):
L = method(50, 5)
for n in L:
bins[n] = bins.get(n, 0) + 1
return bins


I'll leave graphing the results as a exercise for the reader, but here's a
sample I prepared earlier:

{1: 208, 2: 207, 3: 158, 4: 167, 5: 162, 6: 146, 7: 142, 8: 115,
9: 114, 10: 113, 11: 94, 12: 98, 13: 92, 14: 68, 15: 60, 16: 70,
17: 58, 18: 56, 19: 57, 20: 44, 21: 25, 22: 36, 23: 27, 24: 34,
25: 23, 26: 18, 27: 18, 28: 13, 29: 13, 30: 8, 31: 13, 32: 17,
33: 7, 34: 1, 35: 5, 36: 4, 37: 3, 38: 2, 39: 1, 40: 2, 44: 1}
{1: 370, 2: 387, 3: 222, 4: 190, 5: 129, 6: 106, 7: 95, 8: 84,
9: 64, 10: 61, 11: 47, 12: 56, 13: 45, 14: 35, 15: 39, 16: 26,
17: 34, 18: 25, 19: 36, 20: 34, 21: 30, 22: 25, 23: 20, 24: 23,
25: 18, 26: 19, 27: 24, 28: 15, 29: 17, 30: 21, 31: 14, 32: 12,
33: 16, 34: 16, 35: 14, 36: 14, 37: 15, 38: 11, 39: 19, 40: 13,
41: 14, 42: 16, 43: 8, 44: 11, 45: 10}


Even from the raw numbers, the differences are obvious. A bar graph makes
it even clearer.
 
D

Duncan Smith

greg said:
More precisely, the fifth number is not *independent*
of the others. You can't have five independent random
numbers that sum to 50; only four independent numbers
plus a dependent one.

In the interests of precision, that should probably read "are
constrained to sum to 50"; it's quite possible to generate a sequence of
independent random variates that just happen to sum to 50 (as I'm sure
you know).

A fairly efficient way of generating random multinomial variates (which
would satisfy the OP's problem) is based on generating sequences of
independent Poisson variates (with suitably chosen parameters) and then
incrementing the counts in an appropriate fashion. A fair chunk of code
is pasted below. (If anyone knows of any more efficient methods I'd be
interested.)

Duncan

----------------------------------------------------

import math
import random

def multinomial_variate(probs, n):
"""
Returns a random multinomial variate
with parameters I{probs} and I{n}.

@type probs: sequence of C{floats}
@param probs: multinomial probabilities
@type n: C{int}
@param n: total
@rtype: C{list} of C{floats}
@return: multinomial variate
"""
k = len(probs)
root_n = round(n**0.5)
if k > root_n:
lambda_hat = n - root_n - (k - root_n)**0.5
else:
lambda_hat = n - k
gens = [gen_poisson(p * lambda_hat) for p in probs]
gen_obs = gen_discrete(probs)
while True:
samp = [gen.next() for gen in gens]
tot = sum(samp)
if tot == n:
break
elif tot < n:
for _ in xrange(n - tot):
samp[gen_obs.next()] += 1
break
return samp

def _alias_table(probs):
"""
Returns lists of aliases and cutoffs (see Kronmal and
Peterson, 1979, I{The American Statistician}, Vol.33,
No.4., pp. 214-218).

@type probs: sequence of I{floats}
@param probs: sequence of probabilities
@rtype: C{tuple} of C{lists}
@return: a C{list} of aliases and a C{list}
of cutoffs
"""
n = len(probs)
r = [n * p for p in probs] # mean of r is 1
a = [0] * n
G = set()
S = set()
# partition r into values lower and higher than mean
for i, val in enumerate(r):
if val < 1:
S.add(i)
else:
G.add(i)
while True:
try:
j, k = S.pop(), G.pop()
except KeyError:
break
a[j] = k
r[k] += r[j] - 1
if r[k] < 1:
S.add(k)
else:
G.add(k)
return a, r

def gen_discrete(probs):
"""
Generates discrete (0, 1, ..., I{n}-1) variates from the
density defined by I{probs} by the alias method.

@type probs: sequence of C{floats}
@param probs: probabilities of corresponding values
@rtype: generator of I{ints}
@return: generator of independent discrete variates
"""
a, r = _alias_table(probs)
n = len(probs)
rand = random.random
while True:
rnd_n = rand() * n
i = int(rnd_n)
if rnd_n - i < r:
yield i
else:
yield a

def gen_poisson(lambd):
"""
Generates random Poisson variates with parameter I{lambd}.

@type lambd: C{float}
@param lambd: mean of Poisson pmf
@rtype: generator of I{ints}
@return: generator of independent Poisson variates
"""
if lambd < 10:
rand = random.random
L = math.exp(-lambd)
while True:
k = 0
p = 1
while p >= L:
k += 1
p *= rand()
yield k - 1
else:
alpha = int(7 * lambd / 8)
gamma_var = random.gammavariate
while True:
X = gamma_var(alpha, 1)
if X < lambd:
new_lambd = lambd - X
yield alpha + poisson_variate(new_lambd)
else:
yield binomial_variate(alpha - 1, lambd / X)

def poisson_variate(lambd):
"""
Returns a random Poisson variate with parameter I{lambd}.

@type lambd: C{float}
@param lambd: mean of Poisson pmf
@rtype: I{int}
@return: Poisson variate
"""
if lambd < 10:
rand = random.random
L = math.exp(-lambd)
k = 0
p = 1
while p >= L:
k += 1
p *= rand()
return k - 1
else:
a = int(7 * lambd / 8)
X = random.gammavariate(a, 1)
if X < lambd:
new_lambd = lambd - X
return a + poisson_variate(new_lambd)
else:
return binomial_variate(a - 1, lambd / X)

def binomial_variate(n, p):
"""
Returns a random binomial variate with parameters I{n}
and I{p}.

@type n: C{int}
@param n: number of trials
@type p: C{float}
@param p: probability of success
@rtype: I{int}
@return: number of successful trials
"""
if n < 10:
rand = random.random
res = 0
for _ in range(n):
if rand() < p:
res += 1
return res
else:
a = 1 + n // 2
b = n - a + 1
X = random.betavariate(a, b)
if X >= p:
return binomial_variate(a - 1, p / X)
else:
return a + binomial_variate(b - 1, (p - X) / (1 - X))

------------------------------------------------------
 
T

tleeuwenburg

It should be possible to enumerate all sets of five numbers which sum
to 50 and randomly select one of them.

Cheers,
-T
 
P

Paul Rubin

Paul Rubin said:
# yield all partitions of n having length k, with many duplications
def _partitions(n,k):
if k==0: return
for i in xrange(1,n-k+1):
for p in partitions(n-i, k-1):
yield (i,)+p

Bah, I managed to mess up the above function during transcription.
Corrected and sped-up version:

# yield all partitions of n having length k, with many duplications
def _partitions(n,k,low=1):
if k==1:
yield (n,)
return
for i in xrange(low, n-k+1):
for p in _partitions(n-i, k-1, i):
yield (i,)+p
 
S

Steven D'Aprano

It should be possible to enumerate all sets of five numbers which sum
to 50 and randomly select one of them.

Of course it's possible. It's also a very inefficient way of doing so. For
five numbers between 1 and 50, there are 50**5 = 312,500,000 possible sets
to enumerate. Do you really think it is a good idea to generate over three
hundred and twelve million lists, just to return one?

The sole advantage of this method is that it will give "the most random"
result, with the least amount of bias possible. But it isn't clear that
the much more efficient fence-post method is any less random.

See also the Subset Sum Problem:
http://en.wikipedia.org/wiki/Subset_sum_problem
 
P

Paul Rubin

Steven D'Aprano said:
Of course it's possible. It's also a very inefficient way of doing so. For
five numbers between 1 and 50, there are 50**5 = 312,500,000 possible sets
to enumerate.

No there are only 2611 such sets and generating them takes about 0.5
seconds on my laptop. I posted the code for this elsewhere in the thread.
 
H

Hendrik van Rooyen

Yes, if the generating processes yield numbers from different
probability mass functions. You could simply look at the likelihood
ratio. Otherwise, the likelihood ratio will be 1.
I was thinking about the same random number generator being used in the two
disparate ways.

- Hendrik
 
H

Hendrik van Rooyen

Steven D'Aprano said:
Easy-peasey. Just collate the individual numbers from the lists, and graph
their frequencies. You will get quite different distributions. Both are
"random", but in different ways.
8< ------------ code -----------------

Thanks - I think too complex - was thinking about means and std devs and stuff
for each column, and chi squared tests of significance...

: - )

- Hendrik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top