number generator

H

Hendrik van Rooyen

Gabriel Genellina said:
You can't have two different sets with four equal numbers - it's not a
very difficult thing, it's impossible to distinguish because they're
identical!
Given 4 numbers in the set, the 5th is uniquely determined. By example:
12, 3, 10, 18 *must* end with 7. The 5th number is not "random". Any
randomness analysis should include only the first 4 numbers (or any other
set of 4).
In other words, there are only 4 degrees of freedom. In the fence analogy,
you only have to choose where to place 4 poles; the 5th is fixed at the
end.

Yes - the requirement of "5 random numbers" is in a sense in conflict
with the requirement of "that add up to 50" - because for the fifth number,
given that you have chosen the other 4, you have to "get lucky" and
choose the "right one" if you are doing it randomly...

What my question was about was whether some sort of cluster
analysis applied to the results of repeated application of two algorithms
would reveal the way in which the numbers were chosen - the first one being
choose five numbers, see if they add up to 50, repeat if not, repeat till say
1000 samples generated, and the second being choose four numbers,
see if they add to less than 50, repeat if not, else make the fifth the
difference
between the sum and 50, repeat till 1000 instances generated...

Intuitively, the second one will run a LOT faster, all other things being equal,
and I was simply wondering if one could tell the difference afterwards -
in the first case all the numbers clearly come from the same population,
while in the second case four of them do, and the fifth could possibly
be from a different population, as it was "forced" to make up the difference.

So I am not so very sure that the set of fifth numbers of the second run
would actually be indistinguishable from that of the first.

- Hendrik
 
S

Steven D'Aprano

No there are only 2611 such sets and generating them takes about 0.5
seconds on my laptop. I posted the code for this elsewhere in the thread.

If you generate all the possible sets of five numbers between 1 and 50,
there are 50**5 of them, not 2611.

Naturally you don't have to store them all, and if you read what I wrote,
you'll see I never suggested that you have to store them all. That would
be silly. Of course you would use a generator.

Now that I've seen your _partition() function, I'm quite impressed. It is
still brute-force, but it avoids generating all 50**5 non-unique lists. By
my count, it only has to throw away 11,294 lists to generate the 2611
unique lists it returns.

And it is quite fast too: approximately 0.12 seconds on my PC.
Unfortunately, it doesn't scale.

unique_partitions(60, 6) takes 0.8 seconds; unique_partitions(80, 8)
takes about 25 seconds; unique_partitions(80, 10) takes about 80 seconds,
and unique_partitions(81, 10) takes over 90 seconds.

So yes, your partition algorithm does avoid creating the millions of lists
which don't sum to 50, and that's a good thing. But it still brute-forces
thousands or millions of lists when only one is needed.

(E.g. unique_partitions(80, 10) returns 533,975 unique lists.
_partitions(80, 10) gives 3,233,568 non-unique lists.)
 
S

Steven D'Aprano

Now that I've seen your _partition() function, I'm quite impressed. It is
still brute-force, but it avoids generating all 50**5 non-unique lists. By
my count, it only has to throw away 11,294 lists to generate the 2611
unique lists it returns.

Ignore that count. I don't know where I got 11,294 from, the correct
figure is 10,358.

Oh, I seem to have found a bug in the _partitions() function. I believe it
is missing some of the partitions.
list(_partitions(25, 24)) [(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2)]
list(_partitions(25, 25))
[]
 
R

Raymond Hettinger

To make the solutions equi-probable, a simple approach is to
recursively enumerate all possibilities and then choose one of them
with random.choice().

Since people are posting their solutions now (originally only hints
were provided for the homework problem), here's mine:

def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

import random
print random.choice(list(genpool(n=4, m=20)))
 
R

Raymond Hettinger

[Alex Martelli]
map([random.randrange(5) for i in xrange(45)].count, xrange(5))

i.e., this gives 5 integers (each between 0 and 45 included) summing to
45 -- add 1 to each of them to get the desired result.

This is a really nice approach. Besides being fast, it is not too
hard to see that it is correct.

FWIW, here's a variant with a count using the new defaultdict:

n, m = 5, 45
bag = collections.defaultdict(int)
for i in xrange(m):
bag[random.randrange(n)] += 1
ans = [bag[k] for k in range(n)]
 
A

Anton Vredegoor

Raymond said:
Since people are posting their solutions now (originally only hints
were provided for the homework problem), here's mine:

Homework problem? Do you have some information from the OP that I can't
find in this thread? Anyway, I consider the 'homework' idea and the
associated self concept that turns people into -for example-
backstabbing computer scientists who never release their code as the
root of all evil in the educational system. Why should I uphold a moral
code that I don't agree with? To me people posting homework problems are
like refugees which I will help if I can.
def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

import random
print random.choice(list(genpool(n=4, m=20)))


OK back to the *computer* code. Great function! And it's ideally suited
for memoization too. Too bad this memoizor doesn't accept keyword
arguments, but for the rest it works fine and greatly speeds it up.

import random

def memoize(fn):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, fn(*args))
return proxy

@memoize
def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

def test():
print random.choice(list(genpool(5,50)))

if __name__=='__main__':
test()

A.
 
C

cesco

Since people are posting their solutions now (originally only hints
were provided for the homework problem), here's mine:

Well, I never really said it was a homework problem. Sure there are
more implications to my question that I had first imagined and I'm
really happy I could get so many insights but the question did
originate from a real problem I was facing in my research.

Thanks to everyone for the wonderful thread so far
Francesco
 
D

Duncan Smith

Duncan said:
In the interests of precision, that should probably read "are
constrained to sum to 50"; it's quite possible to generate a sequence of
independent random variates that just happen to sum to 50 (as I'm sure
you know).

A fairly efficient way of generating random multinomial variates (which
would satisfy the OP's problem)

[snip]

That is, the OP's original problem (without the positivity constraint
added in a later post).

Duncan
 
A

Alex Martelli

Raymond Hettinger said:
[Alex Martelli]
map([random.randrange(5) for i in xrange(45)].count, xrange(5))

i.e., this gives 5 integers (each between 0 and 45 included) summing to
45 -- add 1 to each of them to get the desired result.

This is a really nice approach. Besides being fast, it is not too
hard to see that it is correct.

FWIW, here's a variant with a count using the new defaultdict:

n, m = 5, 45
bag = collections.defaultdict(int)
for i in xrange(m):
bag[random.randrange(n)] += 1
ans = [bag[k] for k in range(n)]

Neat -- and you can use 1+bag[k] in the LC to adjust the results if
needed, of course.


Alex
 
D

Duncan Smith

Hendrik said:
I was thinking about the same random number generator being used in the two
disparate ways.

- Hendrik

The code I posted for multinomial variates repeatedly generates
independent Poissons until the total is less than, or equal to, the
desired total. Then the total is made up by repeatedly sampling from a
discrete distribution (multinomial with n==1). If you replace the
second stage by just increasing a single count by the necessary amount,
then the two mass functions are clearly different. I'm not sure that
I'd count this as two approaches using "the same random number generator".

Duncan
 
S

Shane Geiger

Raymond: It looks to me as if you are trying to turn a generator into
a list in the final line. That doesn't work.



Raymond said:
To make the solutions equi-probable, a simple approach is to
recursively enumerate all possibilities and then choose one of them
with random.choice().

Since people are posting their solutions now (originally only hints
were provided for the homework problem), here's mine:

def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

import random
print random.choice(list(genpool(n=4, m=20)))


--
Shane Geiger
IT Director
National Council on Economic Education
(e-mail address removed) | 402-438-8958 | http://www.ncee.net

Leading the Campaign for Economic and Financial Literacy
 
P

Paul Rubin

Steven D'Aprano said:
If you generate all the possible sets of five numbers between 1 and 50,
there are 50**5 of them, not 2611.

Yes, the idea is generate only the sets that add up to 50.
unique_partitions(60, 6) takes 0.8 seconds; unique_partitions(80, 8)
takes about 25 seconds; unique_partitions(80, 10) takes about 80 seconds,
and unique_partitions(81, 10) takes over 90 seconds.

Hmm, interesting. Maybe some more optimization is possible.
(E.g. unique_partitions(80, 10) returns 533,975 unique lists.
_partitions(80, 10) gives 3,233,568 non-unique lists.)

Well that's just 6x "inflation" due to duplicates, so I dunno what
else can be done for exhaustive enumeration. Maybe another approach
to generating the k'th partition is possible. I'll try to think about
this.
 
P

Paul Rubin

Raymond Hettinger said:
Since people are posting their solutions now (originally only hints
were provided for the homework problem), here's mine:

def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

import random
print random.choice(list(genpool(n=4, m=20)))


This generates a lot of the partitions more than once, with
possibly unequal probability.
 
A

Anton Vredegoor

Paul said:
def genpool(n, m):
if n == 1:
yield [m]
else:
for i in xrange(1, m):
for rest in genpool(n-1, m-i):
yield rest +

import random
print random.choice(list(genpool(n=4, m=20)))


This generates a lot of the partitions more than once, with
possibly unequal probability.


Well, I just noticed that with my memoization function it produces too
little :-(

But I hope you notice that this function doesn't create only partitions
but all possible outcomes?

A.
 
A

Anton Vredegoor

Anton said:
def memoize(fn):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, fn(*args))
return proxy

Sorry this doesn't work in this case. This works:

def memoize(fn):
cache = {}
def proxy(*args):
try: return cache[args]
except KeyError: return cache.setdefault(args, list(fn(*args)))
return proxy

But then we lose all speed advantages from memoizing so
it's no good either.

A.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top