Random Drawing Simulation -- performance issue

Discussion in 'Python' started by Brendon Towle, Sep 12, 2006.

1. Brendon TowleGuest

I need to simulate scenarios like the following: "You have a deck of
3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
replace it, and repeat N times."

So, I wrote the following code, which works, but it seems quite slow
to me. Can anyone point out some obvious thing that I'm doing
inefficiently? Or, is this more or less as good as it gets?

For reference, my typical numbers look like this:

2 <= len(population) <= 7
4 <= len(mapping) <= 50
10 <= count <= 1000000

B.

<code>
#!/usr/bin/env python

import random

def randomDrawing(count, population):
"""Simulates drawing <count> items from <population>, with
replacement.
population is a list of lists: [[count1, type1], [count2,
type2], ...]

Typical examples:
>>>randomDrawing(100, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])

[[28, 'orange'], [57, 'yellow'], [15, 'blue']]

>>>randomDrawing(100000, [[3, 'orange'], [5, 'yellow'], [2,

'blue']])
[[29923, 'orange'], [50208, 'yellow'], [19869, 'blue']]

"""
res = [[0, item[1]] for item in population]
mapping = []
for i in xrange(len(population)):
mapping.extend(*population[0])
for i in xrange(count):
index = random.choice(mapping)
res[index][0] += 1
return res

</code>

--
Brendon Towle, PhD
Cognitive Scientist
+1-412-690-2442x127
Carnegie Learning, Inc.
The Cognitive Tutor Company ®
Helping over 375,000 students in 1000 school districts succeed in math.

Brendon Towle, Sep 12, 2006

2. Yu-Xi LimGuest

Brendon Towle wrote:
> I need to simulate scenarios like the following: "You have a deck of 3
> orange cards, 5 yellow cards, and 2 blue cards. You draw a card, replace
> it, and repeat N times."
>
> So, I wrote the following code, which works, but it seems quite slow to
> me. Can anyone point out some obvious thing that I'm doing
> inefficiently? Or, is this more or less as good as it gets?

Well, a quick test with count=1000000 shows that random.choice takes up
most of the time (about 85% on my computer). I don't think there's much
else you can do about it. Coding the equivalent of random.choice
yourself isn't going to be faster.

You can try using a 1D array (a simple list) for res to save one lookup
then transforming that list to the appropriate format just before
returning it. But that's prob going to have minimal impact.

Yu-Xi Lim, Sep 12, 2006

Brendon Towle wrote:
> I need to simulate scenarios like the following: "You have a deck of 3
> orange cards, 5 yellow cards, and 2 blue cards. You draw a card, replace
> it, and repeat N times."
>
> So, I wrote the following code, which works, but it seems quite slow to
> me. Can anyone point out some obvious thing that I'm doing
> inefficiently? Or, is this more or less as good as it gets?
>
> For reference, my typical numbers look like this:
>
> 2 <= len(population) <= 7
> 4 <= len(mapping) <= 50
> 10 <= count <= 1000000
>
> B.
>
> <code>
> #!/usr/bin/env python
>
> import random
>
> def randomDrawing(count, population):
> """Simulates drawing <count> items from <population>, with replacement.
> population is a list of lists: [[count1, type1], [count2, type2], ...]
>
> Typical examples:
> >>>randomDrawing(100, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])

> [[28, 'orange'], [57, 'yellow'], [15, 'blue']]
>
> >>>randomDrawing(100000, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])

> [[29923, 'orange'], [50208, 'yellow'], [19869, 'blue']]
>
> """
> res = [[0, item[1]] for item in population]
> mapping = []
> for i in xrange(len(population)):
> mapping.extend(*population[0])
> for i in xrange(count):
> index = random.choice(mapping)
> res[index][0] += 1
> return res
>
> </code>
>
> --Brendon Towle, PhD
> Cognitive Scientist
> +1-412-690-2442x127
> Carnegie Learning, Inc.
> The Cognitive Tutor Company ®
> Helping over 375,000 students in 1000 school districts succeed in math.
>
>

Given the data structure, might it not be faster to generate binomial
variates for n-1 types, and set nth count = #draws - sum(other
outcomes)? And for a "large" count, could you get by with a normal
approximation? If you *do* feel the need for exact binomial, there are
short, somewhat sluggish routines in Devroye (1986 - on the web!, pg
525), and Random_binomial1, compiled by Alan Miller(AMrandom.zip0,
available, xlated, at http://www.esbconsult.com/. Even though they are
relatively slow, generating a few of these would seem to be a lot faster

Sorry I can't help more - I'm just starting to learn Python, have yet to
even get IDLE going.

Regards,

David J. Braden, Sep 12, 2006
4. Simon FormanGuest

Brendon Towle wrote:
> I need to simulate scenarios like the following: "You have a deck of
> 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
> replace it, and repeat N times."
>
> So, I wrote the following code, which works, but it seems quite slow
> to me. Can anyone point out some obvious thing that I'm doing
> inefficiently? Or, is this more or less as good as it gets?
>
> For reference, my typical numbers look like this:
>
> 2 <= len(population) <= 7
> 4 <= len(mapping) <= 50
> 10 <= count <= 1000000
>
> B.
>
> <code>
> #!/usr/bin/env python
>
> import random
>
> def randomDrawing(count, population):
> """Simulates drawing <count> items from <population>, with
> replacement.
> population is a list of lists: [[count1, type1], [count2,
> type2], ...]
>
> Typical examples:
> >>>randomDrawing(100, [[3, 'orange'], [5, 'yellow'], [2, 'blue']])

> [[28, 'orange'], [57, 'yellow'], [15, 'blue']]
>
> >>>randomDrawing(100000, [[3, 'orange'], [5, 'yellow'], [2,

> 'blue']])
> [[29923, 'orange'], [50208, 'yellow'], [19869, 'blue']]
>
> """
> res = [[0, item[1]] for item in population]
> mapping = []
> for i in xrange(len(population)):
> mapping.extend(*population[0])
> for i in xrange(count):
> index = random.choice(mapping)
> res[index][0] += 1
> return res
>
> </code>
>
> --
> Brendon Towle, PhD
> Cognitive Scientist
> +1-412-690-2442x127
> Carnegie Learning, Inc.
> The Cognitive Tutor Company ®
> Helping over 375,000 students in 1000 school districts succeed in math.

I got nearly a 2x speed up with this variant:

def randomDrawing3(count, population):
res = [[0, item[1]] for item in population]
mapping = []
for i in xrange(len(population)):
mapping.extend(*population[0])

n = len(mapping)
for i in xrange(count):
index = int(n * random.random())
res[mapping[index]][0] += 1

return res

Apparently "int(n * random.random())" is faster than random.choice()
or random.randint()

sforman@garbage:~ \$ python -mtimeit -s'import random; n=10' 'int(n *
random.random())'
100000 loops, best of 3: 3.22 usec per loop

sforman@garbage:~ \$ python -mtimeit -s'import random; n=10'
'random.randint(1, n)'
100000 loops, best of 3: 13 usec per loop

sforman@garbage:~ \$ python -mtimeit -s'import random; n=range(10)'
'random.choice(n)'
100000 loops, best of 3: 6.07 usec per loop

(Note that the first and last examples produce values 0..9 while the
middle one produces 1..10)

I don't know for sure, but I think the random, uh, spread or whatever
will be the same for random() as for choice(). If it's important, you
should verify that. ;-)

Peace,
~Simon

Simon Forman, Sep 12, 2006