randomly generate n of each of two types

A

Alan Isaac

I need access to 2*n random choices for two types
subject to a constraint that in the end I have
drawn n of each. I first tried::

def random_types(n,typelist=[True,False]):
types = typelist*n
random.shuffle(types)
for next_type in types:
yield next_type

This works but has some obvious costs.
To avoid those I considered this instead::

def random_types(n,typelist=[True,False]):
type0, type1 = typelist
ct0, ct1 = 0,0
while ct0+ct1<2*n:
if random.random() < ((n-ct0)/(2*n-ct0-ct1)):
next_type = type0
ct0 += 1
else:
next_type = type1
ct1 += 1
yield next_type

Does this seem a good way to go? Comments welcome.

Thank you,
Alan Isaac
 
P

Paul Rubin

Alan Isaac said:
I need access to 2*n random choices for two types
subject to a constraint that in the end I have
drawn n of each. I first tried::

You mean you basically want to generate 2*n bools of which exactly
half are True and half are False? Hmm (untested):

from random import random
def random_types(n):
total = 2*n
trues_needed = n
for i in xrange(total):
if random() < float(trues_needed) / float(total):
trues_needed -= 1
yield True
else:
yield False
total -= 1
 
S

Stargaming

Alan said:
I need access to 2*n random choices for two types
subject to a constraint that in the end I have
drawn n of each. I first tried::

def random_types(n,typelist=[True,False]):
types = typelist*n
random.shuffle(types)
for next_type in types:
yield next_type

This works but has some obvious costs.
To avoid those I considered this instead::

def random_types(n,typelist=[True,False]):
type0, type1 = typelist
ct0, ct1 = 0,0
while ct0+ct1<2*n:
if random.random() < ((n-ct0)/(2*n-ct0-ct1)):
next_type = type0
ct0 += 1
else:
next_type = type1
ct1 += 1
yield next_type

Does this seem a good way to go? Comments welcome.

I think there's any easier way using random's function `shuffle`.
.... types *= n
.... shuffle(types)
.... for x in types: yield x
....
>>> randomly(1, [True, False])
>>> list(randomly(1, [True, False])) [True, False]
>>> list(randomly(2, [True, False])) [True, False, True, False]
>>> list(randomly(4, [True, False]))
[False, True, False, False, False, True, True, True]

You can even do this with an arbitrary number of choices:
>>> list(randomly(2, [True, False, None, NotImplemented]))
[NotImplemented, None, NotImplemented, False, False, None, True, True]

Notice that this method works in-place, ie
>>> a = [True, False]
>>> list(randomly(2, a)) [False, False, True, True]
>>> a # a changed!
[False, False, True, True]

This can be changed by creating a copy, though.

HTH,
Stargaming
 
A

Alan Isaac

Stargaming said:
... types *= n
... shuffle(types)

This again has the "costs" I referred to:
creating a potentially large sequence,
and shuffling it. (Additionally, shuffle
cannot achieve many of the possible
shuffles of a large list, but I doubt this
is easily evaded.)

So as far as I can tell, this is the same
approach as the original (first) solution.

Cheers,
Alan Isaac
 
S

Steven D'Aprano

This again has the "costs" I referred to:
creating a potentially large sequence,
and shuffling it. (Additionally, shuffle
cannot achieve many of the possible
shuffles of a large list, but I doubt this
is easily evaded.)

Whether that second issue is a problem or not really depends on what you
intend doing with the random objects. But notice that Python uses the
Mersenne Twister PRNG, which has a period of 2**19937-1. That's *roughly*
equivalent to the number of permutations of a list with 2080 items.

If you really care about shuffling large lists, you can chain PRNGs. For
instance, you might use shuffle() to generate a random batch of items,
then use a completely different PRNG to shuffle the list again.



As for the first issue:

def random_values(n, valuelist=[True, False]):
for _ in range(n):
values = valuelist[:]
random.shuffle(values)
for item in types:
yield item

Notice that the default objects in the list aren't types, so the name
random_types is inappropriate. The user can pass any objects they like,
not just a list of types like [int, str].

Note also that this code isn't limited to lists of just two objects. You
can pass as many or as few as you like.

It also doesn't build a large list, but any regularities in the shuffle
algorithm will show up in here. Unless you're doing cryptography, that
probably doesn't matter.

If you want to avoid shuffle, here's an alternative:

def random_values(n, valuelist=[True, False]):
N = len(valuelist)
for _ in range(N*n):
yield valuelist[random.randrange(0, N)]
 
P

Paul Rubin

Steven D'Aprano said:
If you want to avoid shuffle, here's an alternative:

def random_values(n, valuelist=[True, False]):
N = len(valuelist)
for _ in range(N*n):
yield valuelist[random.randrange(0, N)]

That is not guaranteed to yield exactly equal numbers of true and false.
 
S

Steven D'Aprano

Steven D'Aprano said:
If you want to avoid shuffle, here's an alternative:

def random_values(n, valuelist=[True, False]):
N = len(valuelist)
for _ in range(N*n):
yield valuelist[random.randrange(0, N)]

That is not guaranteed to yield exactly equal numbers of true and false.

Neither is any random number generator. That's why because it's *random*.

Ah, I see what you mean... you're reminding me that the Original Poster
seems to want a biased set of almost-but-not-quite-randomly chosen
values, so that random_values(1) must return one each of True and False
and never True, True or False, False.

You're right, that's how the O.P. implicitly specified the problem. But it
may not be what he wants.
 
P

Paul Rubin

Steven D'Aprano said:
Ah, I see what you mean... you're reminding me that the Original Poster
seems to want a biased set of almost-but-not-quite-randomly chosen
values, so that random_values(1) must return one each of True and False
and never True, True or False, False.

I thought that was what was specified, "generate n of each of two
types". So if n=10 and the two types are true and false, generate 10
trues and 10 falses. random.shuffle is the most direct way to do it
but I gave another approach that doesn't use auxiliary space except
for a few variables.
 
P

petercable

This again has the "costs" I referred to:
creating a potentially large sequence,
and shuffling it.

I thought I would see if I could do better, so I wrote this:

<code>
import random

def index_types(n, typelist=[True, False]):
numtypes = len(typelist)
total = float(n*numtypes)
counts = [0] * numtypes
while total > 0:
index = int(random.random() * numtypes)
if counts[index] < n:
counts[index] += 1
total -= 1
yield typelist[index]

def shuffle_types(n,typelist=[True,False]):
types = typelist*n
random.shuffle(types)
for next_type in types:
yield next_type

def random_types(n,typelist=[True,False]):
type0, type1 = typelist
ct0, ct1 = 0,0
while ct0+ct1<2*n:
if random.random() < ((n-ct0)/(2*n-ct0-ct1)):
next_type = type0
ct0 += 1
else:
next_type = type1
ct1 += 1
yield next_type

def test_index(n):
for x in index_types(n): pass

def test_shuffle(n):
for x in shuffle_types(n): pass

def test_random(n):
for x in shuffle_types(n): pass

if __name__ == '__main__':
from time import sleep
sleep(10)
from timeit import Timer
for function in ['test_index', 'test_shuffle', 'test_random']:
for nvalue in [1000,10000,100000,1000000]:
t = Timer("%s(%d)" % (function, nvalue), "from __main__
import %s" % function)
print function, 'n =', nvalue, ':', t.timeit(number=100)

</code>

Which yielded the following results:

pete@pete-desktop:~$ python test.py
test_index n = 1000 : 0.599834918976
test_index n = 10000 : 5.78634595871
test_index n = 100000 : 56.1441719532
test_index n = 1000000 : 562.577429056
test_shuffle n = 1000 : 0.420483827591
test_shuffle n = 10000 : 4.62663197517
test_shuffle n = 100000 : 46.3557109833
test_shuffle n = 1000000 : 498.563408852
test_random n = 1000 : 0.440869092941
test_random n = 10000 : 4.77765703201
test_random n = 100000 : 47.6845810413
test_random n = 1000000 : 485.233494043

Which shows my function is the slowest (doh!) and your second function
is the fastest. However, shuffle isn't taking much longer to create
and randomize a fairly large list (4.99 secs vs 4.85 secs for your
second function.) In my defense, I wanted to see if I could write the
function to take an arbitrarily long list, rather than limiting it to
2 items.

I also noticed your second function is not working as intended:
r = [x for x in test.random_types(10)]
r
[False, False, False, False, False, False, False, False, False, False,
True, True, True, True, True, True, True, True, True, True]

I think it needs a cast to a float:

<code>
if random.random() < (float(n-ct0)/(2*n-ct0-ct1)):
</code>

I was too impatient to wait for the n=1000000 results, so here it is
without them:

pete@pete-desktop:~$ python test.py
test_index n = 1000 : 0.583498001099
test_index n = 10000 : 5.80185317993
test_index n = 100000 : 58.8963599205
test_shuffle n = 1000 : 0.431984901428
test_shuffle n = 10000 : 4.75261592865
test_shuffle n = 100000 : 48.1326880455
test_random n = 1000 : 0.697184085846
test_random n = 10000 : 4.41986989975
test_random n = 100000 : 45.7090520859

Once again, the difference is negligible. My question is, does this
function reduce the "randomness" of the data? Doesn't it move the data
towards an alternating pattern of True, False? It seems to me that
calling shuffle, or perhaps chained PRNG as suggested by Steven
If you really care about shuffling large lists, you can chain PRNGs. For
instance, you might use shuffle() to generate a random batch of items,
then use a completely different PRNG to shuffle the list again.

would produce the most "random" data...

Pete
 
A

Alan Isaac

r = [x for x in test.random_types(10)]
r
[False, False, False, False, False, False, False, False, False, False,
True, True, True, True, True, True, True, True, True, True]

I think it needs a cast to a float:

Mea culpa. I **always** import division from __future__
and tend to assume everyone else does too.

Thanks,
Alan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top