How do I sample randomly based on some probability(wightage)?

Sumitava Mukherjee · May 26, 2009

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

Sumitava Mukherjee · May 26, 2009

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

[Oh, I forgot to mention. I am looking for sampling without replacement.]

Click to expand...

Click to expand...

Emile van Sebille · May 26, 2009

On 5/26/2009 11:39 AM Sumitava Mukherjee said...

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I don't know if there's a function for this somewhere, but you can
easily roll your own...

import random

choices = list('ABCDE')
weights = [0.895,0.567,0.765,0.890,0.60]

selections = list("".join([ choice*int(weight*1000) for choice,weight in
zip(choices,weights) ]))

random.shuffle(selections)

for randomchoice in selections:
dosomething(randomchoice)

Emile

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

Arnaud Delobelle · May 26, 2009

Sumitava Mukherjee said:
Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

Click to expand...

You mean A > D > C > E > B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

[Oh, I forgot to mention. I am looking for sampling without replacement.]

Click to expand...

Click to expand...

If you do sampling without replacement, you need to know the exact
number of each of A, B, C, D, E in the sample, not just their relative
frequency.

Mel · May 26, 2009

Sumitava said:
Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

I've usually used something like (untested):

def chooser (objects, weights):
total = 0.0
for obj, weight in zip (objects, weights):
total += weight
if weight < total * random.random():
chosen = obj
return chosen

Works fine if runtime is not a great concern.

Mel.

George Sakkis · May 26, 2009

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

1. Normalize the weights so that they sum up to 1.
2. Form the cumulative sequence S = [0, w0, w0+w1, w0+w1+w2, ..., sum
(w)==1.0]
3. Call random.random() -> x
4. Find the bucket in S that x belongs to, i.e. the i so that s <=
x < s[i+1] (you can do this fast using the bisect module).
5. Return choice
6. Test.
7. Submit homework

HTH,
George

bearophileHUGS · May 26, 2009

Sumitava Mukherjee:

I need to randomly sample from a list where all choices have weights attached to them.<

Like this?
http://code.activestate.com/recipes/498229/

Bye,
bearophile

Dave Angel · May 26, 2009

Sumitava said:
Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

Emile gave you a pretty elegant solution, assuming two things: The
choices can be represented by a single character each, and a roundoff
error of one part in a thousand is acceptable.

Of course, you can represent 256 choices in a 2.x character, and you
could switch to Unicode if that's not enough. And if 1000 isn't enough,
you could make it bigger, at the cost of needing a bigger table.

Here's my more-straightforward algorithm, not reduced to code.

Start with a list of probabilities. Replace each item with the sum of
all the items up to and including itself. (If you do it in-place, you'd
need to do it in reverse order, but if you use a copy, you could simply
replace each item with a sum() of the appropriate slice of the copy.

Anyway, now you've got a list of cumulative probabilites, with the last
item equalling 1.0 (pretty close). You might need to fudge that to
exactly 1.0, just in case.

I think I'd zip that list with the original list of items, so that you
have a single list of tuples.

Now to generate a random sample, call random.random() to get a value
between 0 and 1. Search the list for the first item greater than or
equal to your value, and return the other half of the tuple.

Dennis Lee Bieber · May 27, 2009

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

Well... based on your numbers that should be A>D...

Probably going to be blasted for doing homework, but I was curious
what was in the libraries for the task...

-=-=-=-=-=-=-
import random
import bisect

choices = choices = [ "A", "B", "C", "D", "E" ]
weights = [ 0.895, 0.567, 0.765, 0.890, 0.60 ]

normweights = []
normsum = sum(weights)
normweights.append(weights[0] / normsum)
for i in range(1, len(weights)):
normweights.append((weights / normsum) + normweights[i-1])

for i in range(50):
r = random.random()
print choices[bisect.bisect(normweights, r)],

-=-=-=-=-=-=-=-

pythonw -u "Script11.py"

Click to expand...

C D D B E E D A A C D A C D A B C C E B A A C D A C C D B D D A D B A A
C A D A B C B A D D A D C C

Exit code: 0
pythonw -u "Script11.py"

Click to expand...

C C A D A E D A E C B A D B C D C A C C E E E E D A D B D A E E D C B E
D A C E C B C C D B E D C E

Exit code: 0

Click to expand...

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Jaime Fernandez del Rio · May 27, 2009

[Oh, I forgot to mention. I am looking for sampling without replacement.]

Click to expand...

Click to expand...

That means that, you'll have to code something that updates the sums
of probabilities after each extraction...

Would there be a smart way of doing this, of just adding the whole
updated list again is the best, or at least good enough?

Dennis Lee Bieber · May 28, 2009

I am not sure why everybody is normalizing by dividing the weights.

Because it allows direct use of the value from random.random --
rather than slowing things down later by having to scale the result for
EACH sample drawn.

spot = sample = random.random() * total

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Antoon Pardon · May 28, 2009

Op 2009-05-26 said:
Sumitava Mukherjee said:

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

Click to expand...

Click to expand...

You mean A > D > C > E > B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

Click to expand...

[Oh, I forgot to mention. I am looking for sampling without replacement.]

Click to expand...

Click to expand...

If you do sampling without replacement, you need to know the exact
number of each of A, B, C, D, E in the sample, not just their relative
frequency.

As far as I understand, you are given the exact number of each. It is one.
The numbers given are not relative frequencies of appearance but weights
to be attributed for picking them.

Sumitava Mukherjee · May 31, 2009

Op 2009-05-26, Arnaud Delobelle schreef <[email protected]>:

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

Click to expand...

Click to expand...

You mean A > D > C > E > B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.
I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?
[Oh, I forgot to mention. I am looking for sampling without replacement.]

Click to expand...

Click to expand...

If you do sampling without replacement, you need to know the exact
number of each of A, B, C, D, E in the sample, not just their relative
frequency.

Click to expand...

As far as I understand, you are given the exact number of each. It is one..
The numbers given are not relative frequencies of appearance but weights
to be attributed for picking them.

Yes, the numbers are weights attributed for picking them.

Sumitava Mukherjee · May 31, 2009

Sumitava said:
Sumitava said:

I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.

Click to expand...

I am not sure why everybody is normalizing by dividing the weights.
This isa possibility (I had fun writing it).

def sample_without_replacement(choices, weights):
'''Yield elements sampled w/o replacement by weighting'''
if len(weights) != len(choices):
raise ValueError('%d choices, but %d weights?' % (
len(choices), len(weights)))
if min(weights) < 0:
raise ValueError('Negative weights?: %s' % (
[(i, w) for i, w in enumerate(weights) if w < 0]))

# look at only non-zero probabilities
combined = [(w, v) for w, v in zip(weights, choices) if w > 0]

# Go from highest probability down to reduce expected traversal
combined.sort(key=operator.itemgetter(0), reverse=True)

total = sum(w for w, v in combined) # sum(weights) also works
while combined:
spot = sample = random.random() * total
for n, (weight, choice) in enumerate(combined):
spot -= weight
if spot <= 0:
break
else:
# n, (weight, choice) = 0, combined[0] # Highest probability
raise ValueError('%f left after choosing %f/%f?: %s' % (
spot, sample, total, combined))
yield choice
total -= weight
if weight > total * 256: # arbitrary choice for recalculation
# precision affected, rebuild
total = sum(w for w, v in combined)
del combined[n]
raise ValueError('Samplng more than %d without replacement?' % (
sum(1 for w in weights if w > 0)))

for n in range(10):
gen = sample_without_replacement('abcdef', [32,16,8,4,2,1])
print gen.next(), gen.next(), gen.next()

--Scott David Daniels
(e-mail address removed)

Among all the help (all of which I really appreciate), I found your
solution closest to what I was expecting. Thanks a lot Scott.

Peter Otten · May 31, 2009

Scott said:
raise ValueError('Samplng more than %d without replacement?' % (
sum(1 for w in weights)))

That would be len(weights)

There were two mistakes, both related to the definition of combined:
(1) Combined only holds positive values, so testing in sum is silly.

but I think your first version was correct; I don't see were you remove the
0-items from weights.

Peter

Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
How do i Do this function(dealing with arrays)	1	Dec 10, 2021
In R Shiny, How do I ensure variable value propagation within same code block in R?	0	Sep 29, 2022
I would like to use awk to calculate the total number of records processed	1	Aug 25, 2022
Select Eof extension files based on text list of filenames with if condition	0	May 4, 2022
how to make a tree with randomly selected html tags from an array in python?	0	Mar 10, 2013
FAQ 5.25 How do I randomly update a binary file?	0	Jan 11, 2011
I need help fixing my website	2	Oct 15, 2023

How do I sample randomly based on some probability(wightage)?

Sumitava Mukherjee

Sumitava Mukherjee

Emile van Sebille

Arnaud Delobelle

Mel

George Sakkis

bearophileHUGS

Dave Angel

Dennis Lee Bieber

Jaime Fernandez del Rio

Dennis Lee Bieber

Antoon Pardon

Sumitava Mukherjee

Sumitava Mukherjee

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads