Weighted "random" selection from list of lists

J

Jesse Noller

Hello -

I'm probably missing something here, but I have a problem where I am
populating a list of lists like this:

list1 = [ 'a', 'b', 'c' ]
list2 = [ 'dog', 'cat', 'panda' ]
list3 = [ 'blue', 'red', 'green' ]

main_list = [ list1, list2, list3 ]

Once main_list is populated, I want to build a sequence from items
within the lists, "randomly" with a defined percentage of the sequence
coming for the various lists. For example, if I want a 6 item
sequence, I might want:

60% from list 1 (main_list[0])
30% from list 2 (main_list[1])
10% from list 3 (main_list[2])

I know how to pull a random sequence (using random()) from the lists,
but I'm not sure how to pick it with the desired percentages.

Any help is appreciated, thanks

-jesse
 
R

Ron Adam

Jesse Noller wrote:

60% from list 1 (main_list[0])
30% from list 2 (main_list[1])
10% from list 3 (main_list[2])

I know how to pull a random sequence (using random()) from the lists,
but I'm not sure how to pick it with the desired percentages.

Any help is appreciated, thanks

-jesse

Just add up the total of all lists.

total = len(list1)+len(list2)+len(list3)
n1 = .60 * total # number from list 1
n2 = .30 * total # number from list 2
n3 = .10 * total # number from list 3

You'll need to decide how to handle when a list has too few items in it.

Cheers,
Ron
 
P

Peter Otten

Jesse said:
I'm probably missing something here, but I have a problem where I am
populating a list of lists like this:

list1 = [ 'a', 'b', 'c' ]
list2 = [ 'dog', 'cat', 'panda' ]
list3 = [ 'blue', 'red', 'green' ]

main_list = [ list1, list2, list3 ]

Once main_list is populated, I want to build a sequence from items
within the lists, "randomly" with a defined percentage of the sequence
coming for the various lists. For example, if I want a 6 item
sequence, I might want:

60% from list 1 (main_list[0])
30% from list 2 (main_list[1])
10% from list 3 (main_list[2])

I know how to pull a random sequence (using random()) from the lists,
but I'm not sure how to pick it with the desired percentages.


If the percentages can be normalized to small integral numbers, just make a
pool where each list is repeated according to its weight, e. g.
list1 occurs 6, list2 3 times, and list3 once:

pools = [list1, list2, list3]
weights = [6, 3, 1]
sample_size = 10

weighted_pools = []
for p, w in zip(pools, weights):
weighted_pools.extend([p]*w)

sample = [random.choice(random.choice(weighted_pools))
for _ in xrange(sample_size)]


Another option is to use bisect() to choose a pool:

pools = [list1, list2, list3]
sample_size = 10

def isum(items, sigma=0.0):
for item in items:
sigma += item
yield sigma

cumulated_weights = list(isum([60, 30, 10], 0))
sigma = cumulated_weights[-1]

sample = []
for _ in xrange(sample_size):
pool = pools[bisect.bisect(cumulated_weights, random.random()*sigma)]
sample.append(random.choice(pool))

(all code untested)

Peter
 
S

Scott David Daniels

Jesse Noller wrote:
Once main_list is populated, I want to build a sequence from items
within the lists, "randomly" with a defined percentage of the sequence
coming for the various lists. For example:
60% from list 1 (main_list[0]), 30% from list 2 (main_list[1]), 10% from list 3 (main_list[2])


import bisect, random
main_list = [['a', 'b', 'c'],
['dog', 'cat', 'panda'],
['blue', 'red', 'green']]
weights = [60, 30, 10]

cumulative = []
total = 0
for index, value in enumerate(weights):
total += value
cumulative.append(total)

for i in range(20):
score = random.random() * total
index = bisect.bisect(cumulative, score)
print random.choice(main_list[index]),
 
S

Steven D'Aprano

Once main_list is populated, I want to build a sequence from items
within the lists, "randomly" with a defined percentage of the sequence
coming for the various lists. For example, if I want a 6 item
sequence, I might want:

60% from list 1 (main_list[0])
30% from list 2 (main_list[1])
10% from list 3 (main_list[2])

If you are happy enough to match the percentages statistically rather than
exactly, simply do something like this:

pr = random.random()
if pr < 0.6:
list_num = 0
elif pr < 0.9:
list_num = 1
else:
list_num = 2
return random.choice(main_list[list_num])

or however you want to extract an item.

On average, this will mean 60% of the items will come from list1 etc, but
for small numbers of trials, you may have significant differences.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top