Pick items from list with probability based upon property of listmember ?

S

southof40

I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

This works but seems very clunky to me. Can anyone suggest a better
data structure which would support the 'weighted randomness' I'm
after ?

I'm not fixed on the idea of using a list - could be a dictionary,
tree whatever .

Thanks in advance.

R.
 
S

Stefan Behnel

southof40, 20.06.2010 12:19:
I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

This works but seems very clunky to me.

Why? It's a very simple, generic, easy to understand and fast solution to
the problem.

Stefan
 
S

Steven D'Aprano

I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

That adds to a probability of 1.1, which is impossible.

I'm going to assume that bikes is a typo and should be 0.2.


cars = [car1, car2]
bikes = [bike1, bike2, bike3, bike4, bike5, bike6]
trucks = [truck1, truck2, truck3, truck4]

x = random.random()
if x < 0.7:
return random.choice(cars)
elif x < 0.9:
return random.choice(bikes)
else:
return random.choice(trucks)


But surely this is not physically realistic? The probability of selecting
a car would normally depend on the number of cars, and not be set before
hand.
 
R

Rob Williscroft

southof40 wrote in (e-mail address removed) in gmane.comp.python.general:
I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

Aside, all your probabilities add up to 1.1, they should add up to 1.
I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

Aside, so 7 / 11 bikes, 3 / 11 cars and 1 / 11 trucks, are your
actual probabilities.

But to answer your question, you could create 3 list, and then
pick the list you draw from based on a random number then pick
the item from the list based on another random number:

r = ( random() * 11 )

if r < 1:
picklist = truck_list
elif r < 4:
picklist = bike_list
else:
picklist = car_list

# now pick the final item from pick list.


Rob.
 
M

Mel

southof40 said:
I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

This seems about right. It's like a lottery where the more likely
winners have more tickets, but all tickets are the same. Pick one to
pick the winner.

There's a very expensive, but flexible technique that effectively gives
some tickets a better chance than others. You have to examine each
ticket individually, so this algorithm burns random numbers like
kindling:



import random

#===============================
def weighted_choice (candidates, weight):
chosen = None
total = 0
for c in candidates:
w = weight (c)
total += w
if random.randint (0, total-1) < w:
chosen = c
return chosen
#===============================

def test_weight (c):
return {'c':7, 'b':3, 't':1}[c]

def item_count (s, target):
return sum (1 for x in s if x==target)

test_candidates = 'c'*100 + 'b'*100 + 't'*100

for i in xrange (10):
test = [weighted_choice (test_candidates, test_weight) for k in xrange (100)]
for x in 'cbt':
print x, item_count (test, x), '\t',
print




Mel.
 
N

Nobody

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

This works but seems very clunky to me. Can anyone suggest a better
data structure which would support the 'weighted randomness' I'm
after ?

Calculate the cumulative probabilities and perform a binary search (e.g.
using the bisect module).
 
S

Steven D'Aprano

This seems about right. It's like a lottery where the more likely
winners have more tickets, but all tickets are the same. Pick one to
pick the winner.

It could be expensive to calculate though. Suppose the probabilities were:

0.608729
0.235012
0.156259

There's a very expensive, but flexible technique that effectively gives
some tickets a better chance than others. You have to examine each
ticket individually, so this algorithm burns random numbers like
kindling:



import random

#===============================
def weighted_choice (candidates, weight):
chosen = None
total = 0
for c in candidates:
w = weight (c)
total += w
if random.randint (0, total-1) < w:
chosen = c
return chosen
#===============================

Nice! But instead of randint(0, total-1), you can use randrange(0, total).

This only requires N random numbers (for N candidates), which isn't
really that excessive, but we can do better, and support non-integer
weights as well.

def weighted_choice(candidates, weights):
"""Choose between a list of candidates with a list of weights."""
cumulative_weights = []
running_total = 0
for w in weights:
running_total += w
cumulative_weights.append(running_total)
x = random.uniform(0, running_total)
for item, cw in zip(candidates, cumulative_weights):
if x <= cw:
return item

This is O(N) on the number of candidates, and requires one call to random
no matter what N is.
 
P

Paul Rubin

southof40 said:
I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

You can do it with one pass through the list using a well-known online
algorithm (I don't remember what it's called). Untested code:

import random
tprob = 0 # total probability
for x in vehicle_list:
tprob += x.prob # x.prob is 0.7, 0.3, 0.1, or whatever
if random.uniform(0, tprob) <= x.prob:
selected = x

To see why this works, use mathematical induction. It's obviously right
if there is just one object in the list: tprob and x.prob are the same
thing, so the random number is always in range and that object gets
chosen. And suppose it works correctly for n objects. Then it also
works correctly for n+1 objects, since the (n+1)th object gets chosen
with the correct probility p, and with probability (1-p) one of the
earlier n objects is chosen, each with correct probability by the
induction hypothesis. So by induction, it does the right thing for all n.

This uses one random number per item in the list, but Python's RNG is
pretty fast. You can alternatively build a probability map or histogram
from the list (using more storage) and then select from it witn a single
call to the RNG.
 
D

duncan smith

southof40 said:
I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.

I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

This works but seems very clunky to me. Can anyone suggest a better
data structure which would support the 'weighted randomness' I'm
after ?

I'm not fixed on the idea of using a list - could be a dictionary,
tree whatever .

Thanks in advance.

Try googling for "alias table". Efficient if you're selecting many
random objects from the same mass function. Better than binary search
on the cumulative mass function in big-O terms (but maybe not in
practical terms for reasonable sized problems). Neither approach is as
efficient as the one you outlined, but the required list size might be
an issue for some sets of probabilities.

Duncan
 
D

duncan smith

duncan said:
Try googling for "alias table". Efficient if you're selecting many
random objects from the same mass function. Better than binary search
on the cumulative mass function in big-O terms (but maybe not in
practical terms for reasonable sized problems). Neither approach is as
efficient as the one you outlined, but the required list size might be
an issue for some sets of probabilities.

Duncan

BTW, the alias table approach is basically a means of getting round the
problem of needing large lists. Assuming your probabilities should be
0.7, 0.2 and 0.1 you could construct a list of 3 objects. The first
would be 100% car, the second would be 60% bike and 40% car, the third
would be 30% truck and 70% car. Choose an object at random, then the
vehicle type according to the mass function associated with the object.
The alias table approach only requires the generation of a single
uniform random variate and a single comparison (once you've constructed it).

Duncan
 
S

southof40

I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.
I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.

That adds to a probability of 1.1, which is impossible.

I'm going to assume that bikes is a typo and should be 0.2.

cars = [car1, car2]
bikes = [bike1, bike2, bike3, bike4, bike5, bike6]
trucks = [truck1, truck2, truck3, truck4]

x = random.random()
if x < 0.7:
    return random.choice(cars)
elif x < 0.9:
    return random.choice(bikes)
else:
    return random.choice(trucks)

But surely this is not physically realistic? The probability of selecting
a car would normally depend on the number of cars, and not be set before
hand.

Nice solution thanks.
 
S

southof40

southof40 wrote in (e-mail address removed) in gmane.comp.python.general:



Aside, all your probabilities add up to 1.1, they should add up to 1.


Aside, so 7 / 11 bikes, 3 / 11 cars and 1 / 11 trucks, are your
actual probabilities.

But to answer your question, you could create 3 list, and then
pick the list you draw from based on a random number then pick
the item from the list based on another random number:

r = ( random() * 11 )

if r  < 1:
  picklist = truck_list
elif r  < 4:
  picklist = bike_list
else:
  picklist = car_list

# now pick the final item from pick list.

Rob.

thanks also - nice clean solution
 
S

southof40

southof40 said:
I have list of of N Vehicle objects - the only possible vehicles are
cars, bikes, trucks.
I want to select an object from the list with a probability of : cars
0.7, bikes 0.3, trucks 0.1.
I've currently implemented this by creating another list in which each
car object from the original list appears 7 times, each bike 3 times
and each truck once. I then pick at random from that list.

This seems about right.  It's like a lottery where the more likely
winners have more tickets, but all tickets are the same.  Pick one to
pick the winner.

There's a very expensive, but flexible technique that effectively gives
some tickets a better chance than others.  You have to examine each
ticket individually, so this algorithm burns random numbers like
kindling:

import random

#===============================
def weighted_choice (candidates, weight):
    chosen = None
    total = 0
    for c in candidates:
        w = weight (c)
        total += w
        if random.randint (0, total-1) < w:
            chosen = c
    return chosen
#===============================

def test_weight (c):
    return {'c':7, 'b':3, 't':1}[c]

def item_count (s, target):
    return sum (1 for x in s if x==target)

test_candidates = 'c'*100 + 'b'*100 + 't'*100

for i in xrange (10):
    test = [weighted_choice (test_candidates, test_weight) for k in xrange (100)]
    for x in 'cbt':
        print x, item_count (test, x), '\t',
    print

        Mel.

I like this - makes altering the probabilities straightforward and
this is something I may want to do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top