Pick items from list with probability based upon property of listmember ?

southof40 · Jun 20, 2010

I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

This works but seems very clunky to me. Can anyone suggest a better
data structure which would support the 'weighted randomness' I'm
after ?

I'm not fixed on the idea of using a list - could be a dictionary,
tree whatever .

Thanks in advance.

R.

Stefan Behnel · Jun 20, 2010

southof40, 20.06.2010 12:19:

I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

This works but seems very clunky to me.

Why? It's a very simple, generic, easy to understand and fast solution to
the problem.

Stefan

Steven D'Aprano · Jun 20, 2010

I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

That adds to a probability of 1.1, which is impossible.

I'm going to assume that bikes is a typo and should be 0.2.

cars = [car1, car2]
bikes = [bike1, bike2, bike3, bike4, bike5, bike6]
trucks = [truck1, truck2, truck3, truck4]

x = random.random()
if x < 0.7:
return random.choice(cars)
elif x < 0.9:
return random.choice(bikes)
else:
return random.choice(trucks)

But surely this is not physically realistic? The probability of selecting
a car would normally depend on the number of cars, and not be set before
hand.

Rob Williscroft · Jun 20, 2010

southof40 wrote in (e-mail address removed) in gmane.comp.python.general:

I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

Aside, all your probabilities add up to 1.1, they should add up to 1.

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

Aside, so 7 / 11 bikes, 3 / 11 cars and 1 / 11 trucks, are your
actual probabilities.

But to answer your question, you could create 3 list, and then
pick the list you draw from based on a random number then pick
the item from the list based on another random number:

r = ( random() * 11 )

if r < 1:
picklist = truck_list
elif r < 4:
picklist = bike_list
else:
picklist = car_list

# now pick the final item from pick list.

Rob.

Mel · Jun 20, 2010

southof40 said:
I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

This seems about right. It's like a lottery where the more likely
winners have more tickets, but all tickets are the same. Pick one to
pick the winner.

There's a very expensive, but flexible technique that effectively gives
some tickets a better chance than others. You have to examine each
ticket individually, so this algorithm burns random numbers like
kindling:

import random

#===============================
def weighted_choice (candidates, weight):
chosen = None
total = 0
for c in candidates:
w = weight (c)
total += w
if random.randint (0, total-1) < w:
chosen = c
return chosen
#===============================

def test_weight (c):
return {'c':7, 'b':3, 't':1}[c]

def item_count (s, target):
return sum (1 for x in s if x==target)

test_candidates = 'c'*100 + 'b'*100 + 't'*100

for i in xrange (10):
test = [weighted_choice (test_candidates, test_weight) for k in xrange (100)]
for x in 'cbt':
print x, item_count (test, x), '\t',
print

Mel.

Nobody · Jun 20, 2010

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

This works but seems very clunky to me. Can anyone suggest a better
data structure which would support the 'weighted randomness' I'm
after ?

Calculate the cumulative probabilities and perform a binary search (e.g.
using the bisect module).

Steven D'Aprano · Jun 20, 2010

This seems about right. It's like a lottery where the more likely
winners have more tickets, but all tickets are the same. Pick one to
pick the winner.

It could be expensive to calculate though. Suppose the probabilities were:

0.608729
0.235012
0.156259

There's a very expensive, but flexible technique that effectively gives
some tickets a better chance than others. You have to examine each
ticket individually, so this algorithm burns random numbers like
kindling:

import random

#===============================
def weighted_choice (candidates, weight):
chosen = None
total = 0
for c in candidates:
w = weight (c)
total += w
if random.randint (0, total-1) < w:
chosen = c
return chosen
#===============================

Nice! But instead of randint(0, total-1), you can use randrange(0, total).

This only requires N random numbers (for N candidates), which isn't
really that excessive, but we can do better, and support non-integer
weights as well.

def weighted_choice(candidates, weights):
"""Choose between a list of candidates with a list of weights."""
cumulative_weights = []
running_total = 0
for w in weights:
running_total += w
cumulative_weights.append(running_total)
x = random.uniform(0, running_total)
for item, cw in zip(candidates, cumulative_weights):
if x <= cw:
return item

This is O(N) on the number of candidates, and requires one call to random
no matter what N is.

Paul Rubin · Jun 20, 2010

southof40 said:
I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

You can do it with one pass through the list using a well-known online
algorithm (I don't remember what it's called). Untested code:

import random
tprob = 0 # total probability
for x in vehicle_list:
tprob += x.prob # x.prob is 0.7, 0.3, 0.1, or whatever
if random.uniform(0, tprob) <= x.prob:
selected = x

To see why this works, use mathematical induction. It's obviously right
if there is just one object in the list: tprob and x.prob are the same
thing, so the random number is always in range and that object gets
chosen. And suppose it works correctly for n objects. Then it also
works correctly for n+1 objects, since the (n+1)th object gets chosen
with the correct probility p, and with probability (1-p) one of the
earlier n objects is chosen, each with correct probability by the
induction hypothesis. So by induction, it does the right thing for all n.

This uses one random number per item in the list, but Python's RNG is
pretty fast. You can alternatively build a probability map or histogram
from the list (using more storage) and then select from it witn a single
call to the RNG.

duncan smith · Jun 20, 2010

southof40 said:
I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

This works but seems very clunky to me. Can anyone suggest a better
data structure which would support the 'weighted randomness' I'm
after ?

I'm not fixed on the idea of using a list - could be a dictionary,
tree whatever .

Thanks in advance.

Try googling for "alias table". Efficient if you're selecting many
random objects from the same mass function. Better than binary search
on the cumulative mass function in big-O terms (but maybe not in
practical terms for reasonable sized problems). Neither approach is as
efficient as the one you outlined, but the required list size might be
an issue for some sets of probabilities.

Duncan

duncan smith · Jun 21, 2010

duncan said:
Try googling for "alias table". Efficient if you're selecting many
random objects from the same mass function. Better than binary search
on the cumulative mass function in big-O terms (but maybe not in
practical terms for reasonable sized problems). Neither approach is as
efficient as the one you outlined, but the required list size might be
an issue for some sets of probabilities.

Duncan

BTW, the alias table approach is basically a means of getting round the
problem of needing large lists. Assuming your probabilities should be
0.7, 0.2 and 0.1 you could construct a list of 3 objects. The first
would be 100% car, the second would be 60% bike and 40% car, the third
would be 30% truck and 70% car. Choose an object at random, then the
vehicle type according to the mass function associated with the object.
The alias table approach only requires the generation of a single
uniform random variate and a single comparison (once you've constructed it).

Duncan

southof40 · Jun 21, 2010

I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

Click to expand...

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

Click to expand...

That adds to a probability of 1.1, which is impossible.

I'm going to assume that bikes is a typo and should be 0.2.

cars = [car1, car2]
bikes = [bike1, bike2, bike3, bike4, bike5, bike6]
trucks = [truck1, truck2, truck3, truck4]

x = random.random()
if x < 0.7:
return random.choice(cars)
elif x < 0.9:
return random.choice(bikes)
else:
return random.choice(trucks)

But surely this is not physically realistic? The probability of selecting
a car would normally depend on the number of cars, and not be set before
hand.

Nice solution thanks.

southof40 · Jun 21, 2010

southof40 wrote in (e-mail address removed) in gmane.comp.python.general:

Aside, all your probabilities add up to 1.1, they should add up to 1.

Aside, so 7 / 11 bikes, 3 / 11 cars and 1 / 11 trucks, are your
actual probabilities.

But to answer your question, you could create 3 list, and then
pick the list you draw from based on a random number then pick
the item from the list based on another random number:

r = ( random() * 11 )

if r < 1:
picklist = truck_list
elif r < 4:
picklist = bike_list
else:
picklist = car_list

# now pick the final item from pick list.

Rob.

thanks also - nice clean solution

southof40 · Jun 21, 2010

southof40 said:
southof40 said:

I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

Click to expand...

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

Click to expand...

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

Click to expand...

This seems about right. It's like a lottery where the more likely
winners have more tickets, but all tickets are the same. Pick one to
pick the winner.

There's a very expensive, but flexible technique that effectively gives
some tickets a better chance than others. You have to examine each
ticket individually, so this algorithm burns random numbers like
kindling:

import random

#===============================
def weighted_choice (candidates, weight):
chosen = None
total = 0
for c in candidates:
w = weight (c)
total += w
if random.randint (0, total-1) < w:
chosen = c
return chosen
#===============================

def test_weight (c):
return {'c':7, 'b':3, 't':1}[c]

def item_count (s, target):
return sum (1 for x in s if x==target)

test_candidates = 'c'*100 + 'b'*100 + 't'*100

for i in xrange (10):
test = [weighted_choice (test_candidates, test_weight) for k in xrange (100)]
for x in 'cbt':
print x, item_count (test, x), '\t',
print

Mel.

I like this - makes altering the probabilities straightforward and
this is something I may want to do.

Building a tree-based readline completer	5	Nov 18, 2013
permutations, patterns, and probability	1	Feb 1, 2005
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
resume picking items from a previous list	5	Apr 30, 2006
A little diversion (probability fun in Ruby)	1	Aug 4, 2006
Dynamic objects from files	4	Aug 16, 2005
Basic help with understanding RoR Inheritance	4	Nov 6, 2005
Picking X random entries from linked list of Y entries	3	Aug 2, 2010

Pick items from list with probability based upon property of listmember ?

southof40

Stefan Behnel

Steven D'Aprano

Rob Williscroft

Mel

Nobody

Steven D'Aprano

Paul Rubin

duncan smith

duncan smith

southof40

southof40

southof40

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads