number generator

Hendrik van Rooyen · Mar 14, 2007

Gabriel Genellina said:
You can't have two different sets with four equal numbers - it's not a
very difficult thing, it's impossible to distinguish because they're
identical!
Given 4 numbers in the set, the 5th is uniquely determined. By example:
12, 3, 10, 18 *must* end with 7. The 5th number is not "random". Any
randomness analysis should include only the first 4 numbers (or any other
set of 4).
In other words, there are only 4 degrees of freedom. In the fence analogy,
you only have to choose where to place 4 poles; the 5th is fixed at the
end.

Yes - the requirement of "5 random numbers" is in a sense in conflict
with the requirement of "that add up to 50" - because for the fifth number,
given that you have chosen the other 4, you have to "get lucky" and
choose the "right one" if you are doing it randomly...

What my question was about was whether some sort of cluster
analysis applied to the results of repeated application of two algorithms
would reveal the way in which the numbers were chosen - the first one being
choose five numbers, see if they add up to 50, repeat if not, repeat till say
1000 samples generated, and the second being choose four numbers,
see if they add to less than 50, repeat if not, else make the fifth the
difference
between the sum and 50, repeat till 1000 instances generated...

Intuitively, the second one will run a LOT faster, all other things being equal,
and I was simply wondering if one could tell the difference afterwards -
in the first case all the numbers clearly come from the same population,
while in the second case four of them do, and the fifth could possibly
be from a different population, as it was "forced" to make up the difference.

So I am not so very sure that the set of fifth numbers of the second run
would actually be indistinguishable from that of the first.

- Hendrik

Steven D'Aprano · Mar 14, 2007

No there are only 2611 such sets and generating them takes about 0.5
seconds on my laptop. I posted the code for this elsewhere in the thread.

If you generate all the possible sets of five numbers between 1 and 50,
there are 50**5 of them, not 2611.

Naturally you don't have to store them all, and if you read what I wrote,
you'll see I never suggested that you have to store them all. That would
be silly. Of course you would use a generator.

Now that I've seen your _partition() function, I'm quite impressed. It is
still brute-force, but it avoids generating all 50**5 non-unique lists. By
my count, it only has to throw away 11,294 lists to generate the 2611
unique lists it returns.

And it is quite fast too: approximately 0.12 seconds on my PC.
Unfortunately, it doesn't scale.

unique_partitions(60, 6) takes 0.8 seconds; unique_partitions(80, 8)
takes about 25 seconds; unique_partitions(80, 10) takes about 80 seconds,
and unique_partitions(81, 10) takes over 90 seconds.

So yes, your partition algorithm does avoid creating the millions of lists
which don't sum to 50, and that's a good thing. But it still brute-forces
thousands or millions of lists when only one is needed.

(E.g. unique_partitions(80, 10) returns 533,975 unique lists.
_partitions(80, 10) gives 3,233,568 non-unique lists.)

Steven D'Aprano · Mar 14, 2007

Now that I've seen your _partition() function, I'm quite impressed. It is
still brute-force, but it avoids generating all 50**5 non-unique lists. By
my count, it only has to throw away 11,294 lists to generate the 2611
unique lists it returns.

Ignore that count. I don't know where I got 11,294 from, the correct
figure is 10,358.

Oh, I seem to have found a bug in the _partitions() function. I believe it
is missing some of the partitions.

list(_partitions(25, 24)) [(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2)]
list(_partitions(25, 25))

Click to expand...

Click to expand...

[]

Raymond Hettinger · Mar 14, 2007

To make the solutions equi-probable, a simple approach is to

recursively enumerate all possibilities and then choose one of them
with random.choice().

Since people are posting their solutions now (originally only hints
were provided for the homework problem), here's mine:

def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

import random
print random.choice(list(genpool(n=4, m=20)))

Raymond Hettinger · Mar 14, 2007

[Alex Martelli]

map([random.randrange(5) for i in xrange(45)].count, xrange(5))

i.e., this gives 5 integers (each between 0 and 45 included) summing to
45 -- add 1 to each of them to get the desired result.

This is a really nice approach. Besides being fast, it is not too
hard to see that it is correct.

FWIW, here's a variant with a count using the new defaultdict:

n, m = 5, 45
bag = collections.defaultdict(int)
for i in xrange(m):
bag[random.randrange(n)] += 1
ans = [bag[k] for k in range(n)]

Anton Vredegoor · Mar 14, 2007

Raymond said:
Since people are posting their solutions now (originally only hints
were provided for the homework problem), here's mine:

Homework problem? Do you have some information from the OP that I can't
find in this thread? Anyway, I consider the 'homework' idea and the
associated self concept that turns people into -for example-
backstabbing computer scientists who never release their code as the
root of all evil in the educational system. Why should I uphold a moral
code that I don't agree with? To me people posting homework problems are
like refugees which I will help if I can.

def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

import random
print random.choice(list(genpool(n=4, m=20)))

OK back to the *computer* code. Great function! And it's ideally suited
for memoization too. Too bad this memoizor doesn't accept keyword
arguments, but for the rest it works fine and greatly speeds it up.

import random

def memoize(fn):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, fn(*args))
return proxy

@memoize
def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

def test():
print random.choice(list(genpool(5,50)))

if __name__=='__main__':
test()

A.

cesco · Mar 14, 2007

Since people are posting their solutions now (originally only hints

were provided for the homework problem), here's mine:

Well, I never really said it was a homework problem. Sure there are
more implications to my question that I had first imagined and I'm
really happy I could get so many insights but the question did
originate from a real problem I was facing in my research.

Thanks to everyone for the wonderful thread so far
Francesco

Duncan Smith · Mar 14, 2007

Duncan said:
In the interests of precision, that should probably read "are
constrained to sum to 50"; it's quite possible to generate a sequence of
independent random variates that just happen to sum to 50 (as I'm sure
you know).

A fairly efficient way of generating random multinomial variates (which
would satisfy the OP's problem)

[snip]

That is, the OP's original problem (without the positivity constraint
added in a later post).

Duncan

Alex Martelli · Mar 14, 2007

Raymond Hettinger said:
[Alex Martelli]

map([random.randrange(5) for i in xrange(45)].count, xrange(5))

i.e., this gives 5 integers (each between 0 and 45 included) summing to
45 -- add 1 to each of them to get the desired result.

Click to expand...

This is a really nice approach. Besides being fast, it is not too
hard to see that it is correct.

FWIW, here's a variant with a count using the new defaultdict:

n, m = 5, 45
bag = collections.defaultdict(int)
for i in xrange(m):
bag[random.randrange(n)] += 1
ans = [bag[k] for k in range(n)]

Neat -- and you can use 1+bag[k] in the LC to adjust the results if
needed, of course.

Alex

Duncan Smith · Mar 14, 2007

Hendrik said:
I was thinking about the same random number generator being used in the two
disparate ways.

- Hendrik

The code I posted for multinomial variates repeatedly generates
independent Poissons until the total is less than, or equal to, the
desired total. Then the total is made up by repeatedly sampling from a
discrete distribution (multinomial with n==1). If you replace the
second stage by just increasing a single count by the necessary amount,
then the two mass functions are clearly different. I'm not sure that
I'd count this as two approaches using "the same random number generator".

Duncan

Shane Geiger · Mar 14, 2007

Raymond: It looks to me as if you are trying to turn a generator into
a list in the final line. That doesn't work.

Raymond said:
To make the solutions equi-probable, a simple approach is to
recursively enumerate all possibilities and then choose one of them
with random.choice().

Click to expand...

Since people are posting their solutions now (originally only hints
were provided for the homework problem), here's mine:

def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

import random
print random.choice(list(genpool(n=4, m=20)))

--
Shane Geiger
IT Director
National Council on Economic Education
(e-mail address removed) | 402-438-8958 | http://www.ncee.net

Leading the Campaign for Economic and Financial Literacy

Paul Rubin · Mar 14, 2007

Steven D'Aprano said:
If you generate all the possible sets of five numbers between 1 and 50,
there are 50**5 of them, not 2611.

Yes, the idea is generate only the sets that add up to 50.

unique_partitions(60, 6) takes 0.8 seconds; unique_partitions(80, 8)
takes about 25 seconds; unique_partitions(80, 10) takes about 80 seconds,
and unique_partitions(81, 10) takes over 90 seconds.

Hmm, interesting. Maybe some more optimization is possible.

(E.g. unique_partitions(80, 10) returns 533,975 unique lists.
_partitions(80, 10) gives 3,233,568 non-unique lists.)

Well that's just 6x "inflation" due to duplicates, so I dunno what
else can be done for exhaustive enumeration. Maybe another approach
to generating the k'th partition is possible. I'll try to think about
this.

Paul Rubin · Mar 14, 2007

Raymond Hettinger said:
Since people are posting their solutions now (originally only hints
were provided for the homework problem), here's mine:

def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

import random
print random.choice(list(genpool(n=4, m=20)))

This generates a lot of the partitions more than once, with
possibly unequal probability.

Anton Vredegoor · Mar 14, 2007

Paul said:
def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

import random
print random.choice(list(genpool(n=4, m=20)))

Click to expand...

This generates a lot of the partitions more than once, with
possibly unequal probability.

Well, I just noticed that with my memoization function it produces too
little :-(

But I hope you notice that this function doesn't create only partitions
but all possible outcomes?

A.

Paul Rubin · Mar 14, 2007

Steven D'Aprano said:
list(_partitions(25, 24)) [(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2)]
list(_partitions(25, 25))

Click to expand...

Click to expand...

[]

Hmm. I'll look into this later if I get a chance. Thanks.

Anton Vredegoor · Mar 14, 2007

Anton said:
def memoize(fn):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, fn(*args))
return proxy

Sorry this doesn't work in this case. This works:

def memoize(fn):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, list(fn(*args)))
return proxy

But then we lose all speed advantages from memoizing so
it's no good either.

A.

Prime number generator	11	Jul 10, 2013
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Minimum Total Difficulty	0	Nov 15, 2023
Trying to get the average value of the elements, please help ! JavaScript	3	Dec 13, 2022
Remove listing after being entered	8	Dec 21, 2022
HELP PLEASE	4	Jul 20, 2022
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023

number generator

Hendrik van Rooyen

Steven D'Aprano

Steven D'Aprano

Raymond Hettinger

Raymond Hettinger

Anton Vredegoor

cesco

Duncan Smith

Alex Martelli

Duncan Smith

Shane Geiger

Paul Rubin

Paul Rubin

Anton Vredegoor

Paul Rubin

Anton Vredegoor

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads