# How do I sample randomly based on some probability(wightage)?

Discussion in 'Python' started by Sumitava Mukherjee, May 26, 2009.

1. ### Sumitava MukherjeeGuest

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

Sumitava Mukherjee, May 26, 2009

2. ### Sumitava MukherjeeGuest

On May 26, 11:39 pm, Sumitava Mukherjee <> wrote:
> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>
> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> if I draw 10 samples, 9 should be H and 1 should be T.
>
> I coudn't find a function in the module random that does so.
> Please can someone guide me how the above could be implemented [either
> through some function which exists and I don't know or pointers to
> some code snippets which does so]?

>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]

Sumitava Mukherjee, May 26, 2009

3. ### Emile van SebilleGuest

On 5/26/2009 11:39 AM Sumitava Mukherjee said...
> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>
> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> if I draw 10 samples, 9 should be H and 1 should be T.

I don't know if there's a function for this somewhere, but you can

import random

choices = list('ABCDE')
weights = [0.895,0.567,0.765,0.890,0.60]

selections = list("".join([ choice*int(weight*1000) for choice,weight in
zip(choices,weights) ]))

random.shuffle(selections)

for randomchoice in selections:
dosomething(randomchoice)

Emile

>
> I coudn't find a function in the module random that does so.
> Please can someone guide me how the above could be implemented [either
> through some function which exists and I don't know or pointers to
> some code snippets which does so]?

Emile van Sebille, May 26, 2009
4. ### Arnaud DelobelleGuest

Sumitava Mukherjee <> writes:

> On May 26, 11:39 pm, Sumitava Mukherjee <> wrote:
>> Hi all,
>> I need to randomly sample from a list where all choices have weights
>> attached to them. The probability of them being choosen is dependent
>> on the weights.
>> If say Sample list of choices are [A,B,C,D,E] and weights of the same
>> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
>> want the likeliness of them being chosen be in the order : D>A>C>E>B

You mean A > D > C > E > B

>> In short I mean if prob of a H is .9 and probability of T be 0.1 then
>> if I draw 10 samples, 9 should be H and 1 should be T.
>>
>> I coudn't find a function in the module random that does so.
>> Please can someone guide me how the above could be implemented [either
>> through some function which exists and I don't know or pointers to
>> some code snippets which does so]?

>
>>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]

If you do sampling without replacement, you need to know the exact
number of each of A, B, C, D, E in the sample, not just their relative
frequency.

--
Arnaud

Arnaud Delobelle, May 26, 2009
5. ### MelGuest

Sumitava Mukherjee wrote:

> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>
> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> if I draw 10 samples, 9 should be H and 1 should be T.
>
> I coudn't find a function in the module random that does so.
> Please can someone guide me how the above could be implemented [either
> through some function which exists and I don't know or pointers to
> some code snippets which does so]?

I've usually used something like (untested):

def chooser (objects, weights):
total = 0.0
for obj, weight in zip (objects, weights):
total += weight
if weight < total * random.random():
chosen = obj
return chosen

Works fine if runtime is not a great concern.

Mel.

Mel, May 26, 2009
6. ### George SakkisGuest

On May 26, 2:39 pm, Sumitava Mukherjee <> wrote:
> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>
> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> if I draw 10 samples, 9 should be H and 1 should be T.
>
> I coudn't find a function in the module random that does so.
> Please can someone guide me how the above could be implemented [either
> through some function which exists and I don't know or pointers to
> some code snippets which does so]?

1. Normalize the weights so that they sum up to 1.
2. Form the cumulative sequence S = [0, w0, w0+w1, w0+w1+w2, ..., sum
(w)==1.0]
3. Call random.random() -> x
4. Find the bucket in S that x belongs to, i.e. the i so that s <=
x < s[i+1] (you can do this fast using the bisect module).
5. Return choice
6. Test.
7. Submit homework

HTH,
George

George Sakkis, May 26, 2009
7. ### Guest

, May 26, 2009
8. ### Dave AngelGuest

Sumitava Mukherjee wrote:
> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>
> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> if I draw 10 samples, 9 should be H and 1 should be T.
>
> I coudn't find a function in the module random that does so.
> Please can someone guide me how the above could be implemented [either
> through some function which exists and I don't know or pointers to
> some code snippets which does so]?
>
>

Emile gave you a pretty elegant solution, assuming two things: The
choices can be represented by a single character each, and a roundoff
error of one part in a thousand is acceptable.

Of course, you can represent 256 choices in a 2.x character, and you
could switch to Unicode if that's not enough. And if 1000 isn't enough,
you could make it bigger, at the cost of needing a bigger table.

Here's my more-straightforward algorithm, not reduced to code.

Start with a list of probabilities. Replace each item with the sum of
all the items up to and including itself. (If you do it in-place, you'd
need to do it in reverse order, but if you use a copy, you could simply
replace each item with a sum() of the appropriate slice of the copy.

Anyway, now you've got a list of cumulative probabilites, with the last
item equalling 1.0 (pretty close). You might need to fudge that to
exactly 1.0, just in case.

I think I'd zip that list with the original list of items, so that you
have a single list of tuples.

Now to generate a random sample, call random.random() to get a value
between 0 and 1. Search the list for the first item greater than or
equal to your value, and return the other half of the tuple.

Dave Angel, May 26, 2009
9. ### Dennis Lee BieberGuest

On Tue, 26 May 2009 11:39:36 -0700 (PDT), Sumitava Mukherjee
<> declaimed the following in
gmane.comp.python.general:

> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>

Well... based on your numbers that should be A>D...

Probably going to be blasted for doing homework, but I was curious
what was in the libraries for the task...

-=-=-=-=-=-=-
import random
import bisect

choices = choices = [ "A", "B", "C", "D", "E" ]
weights = [ 0.895, 0.567, 0.765, 0.890, 0.60 ]

normweights = []
normsum = sum(weights)
normweights.append(weights[0] / normsum)
for i in range(1, len(weights)):
normweights.append((weights / normsum) + normweights[i-1])

for i in range(50):
r = random.random()
print choices[bisect.bisect(normweights, r)],

-=-=-=-=-=-=-=-
>pythonw -u "Script11.py"

C D D B E E D A A C D A C D A B C C E B A A C D A C C D B D D A D B A A
C A D A B C B A D D A D C C
>Exit code: 0
>pythonw -u "Script11.py"

C C A D A E D A E C B A D B C D C A C C E E E E D A D B D A E E D C B E
D A C E C B C C D B E D C E
>Exit code: 0

--
Wulfraed Dennis Lee Bieber KD6MOG

HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: )
HTTP://www.bestiaria.com/

Dennis Lee Bieber, May 27, 2009
10. ### Jaime Fernandez del RioGuest

On Tue, May 26, 2009 at 8:42 PM, Sumitava Mukherjee
<> wrote:
> On May 26, 11:39 pm, Sumitava Mukherjee <> wrote:
>
>>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]

That means that, you'll have to code something that updates the sums
of probabilities after each extraction...

Would there be a smart way of doing this, of just adding the whole
updated list again is the best, or at least good enough?

--
(\__/)
( )
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de dominación mundial.

Jaime Fernandez del Rio, May 27, 2009
11. ### Dennis Lee BieberGuest

On Wed, 27 May 2009 08:08:53 -0700, Scott David Daniels
<> declaimed the following in
gmane.comp.python.general:

> I am not sure why everybody is normalizing by dividing the weights.

Because it allows direct use of the value from random.random --
rather than slowing things down later by having to scale the result for
EACH sample drawn.

> spot = sample = random.random() * total

--
Wulfraed Dennis Lee Bieber KD6MOG

HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: )
HTTP://www.bestiaria.com/

Dennis Lee Bieber, May 28, 2009
12. ### Antoon PardonGuest

Op 2009-05-26, Arnaud Delobelle schreef <>:
> Sumitava Mukherjee <> writes:
>
>> On May 26, 11:39 pm, Sumitava Mukherjee <> wrote:
>>> Hi all,
>>> I need to randomly sample from a list where all choices have weights
>>> attached to them. The probability of them being choosen is dependent
>>> on the weights.
>>> If say Sample list of choices are [A,B,C,D,E] and weights of the same
>>> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
>>> want the likeliness of them being chosen be in the order : D>A>C>E>B

>
> You mean A > D > C > E > B
>
>>> In short I mean if prob of a H is .9 and probability of T be 0.1 then
>>> if I draw 10 samples, 9 should be H and 1 should be T.
>>>
>>> I coudn't find a function in the module random that does so.
>>> Please can someone guide me how the above could be implemented [either
>>> through some function which exists and I don't know or pointers to
>>> some code snippets which does so]?

>>
>>>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]

>
> If you do sampling without replacement, you need to know the exact
> number of each of A, B, C, D, E in the sample, not just their relative
> frequency.

As far as I understand, you are given the exact number of each. It is one.
The numbers given are not relative frequencies of appearance but weights
to be attributed for picking them.

--
Antoon Pardon

Antoon Pardon, May 28, 2009
13. ### Sumitava MukherjeeGuest

On May 28, 3:52 pm, Antoon Pardon <> wrote:
> Op 2009-05-26, Arnaud Delobelle schreef <>:
>
>
>
> > Sumitava Mukherjee <> writes:

>
> >> On May 26, 11:39 pm, Sumitava Mukherjee <> wrote:
> >>> Hi all,
> >>> I need to randomly sample from a list where all choices have weights
> >>> attached to them. The probability of them being choosen is dependent
> >>> on the weights.
> >>> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> >>> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> >>> want the likeliness of them being chosen be in the order : D>A>C>E>B

>
> > You mean A > D > C > E > B

>
> >>> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> >>> if I draw 10 samples, 9 should be H and 1 should be T.

>
> >>> I coudn't find a function in the module random that does so.
> >>> Please can someone guide me how the above could be implemented [either
> >>> through some function which exists and I don't know or pointers to
> >>> some code snippets which does so]?

>
> >>>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]

>
> > If you do sampling without replacement, you need to know the exact
> > number of each of A, B, C, D, E in the sample, not just their relative
> > frequency.

>
> As far as I understand, you are given the exact number of each. It is one..
> The numbers given are not relative frequencies of appearance but weights
> to be attributed for picking them.
>
> --
> Antoon Pardon

Yes, the numbers are weights attributed for picking them.

Sumitava Mukherjee, May 31, 2009
14. ### Sumitava MukherjeeGuest

On May 27, 8:08 pm, Scott David Daniels <> wrote:
> Sumitava Mukherjee wrote:
> > I need to randomly sample from a list where all choices have weights
> > attached to them. The probability of them being choosen is dependent
> > on the weights.

>
> I am not sure why everybody is normalizing by dividing the weights.
> This isa possibility (I had fun writing it).
>
> def sample_without_replacement(choices, weights):
>      '''Yield elements sampled w/o replacement by weighting'''
>      if len(weights) != len(choices):
>          raise ValueError('%d choices, but %d weights?' % (
>              len(choices), len(weights)))
>      if min(weights) < 0:
>          raise ValueError('Negative weights?: %s' % (
>                      [(i, w) for i, w in enumerate(weights) if w < 0]))
>
>      # look at only non-zero probabilities
>      combined = [(w, v) for w, v in zip(weights, choices) if w > 0]
>
>      # Go from highest probability down to reduce expected traversal
>      combined.sort(key=operator.itemgetter(0), reverse=True)
>
>      total = sum(w for w, v in combined) # sum(weights) also works
>      while combined:
>          spot = sample = random.random() * total
>          for n, (weight, choice) in enumerate(combined):
>              spot -= weight
>              if spot <= 0:
>                  break
>          else:
>              # n, (weight, choice) = 0, combined[0] # Highest probability
>              raise ValueError('%f left after choosing %f/%f?: %s' % (
>                                  spot, sample, total, combined))
>          yield choice
>          total -= weight
>          if weight > total * 256: # arbitrary choice for recalculation
>              # precision affected, rebuild
>              total = sum(w for w, v in combined)
>          del combined[n]
>      raise ValueError('Samplng more than %d without replacement?' % (
>                        sum(1 for w in weights if w > 0)))
>
> for n in range(10):
>      gen = sample_without_replacement('abcdef', [32,16,8,4,2,1])
>      print gen.next(), gen.next(), gen.next()
>
> --Scott David Daniels
>

Among all the help (all of which I really appreciate), I found your
solution closest to what I was expecting. Thanks a lot Scott.

Sumitava Mukherjee, May 31, 2009
15. ### Peter OttenGuest

Scott David Daniels wrote:

> raise ValueError('Samplng more than %d without replacement?' % (
> sum(1 for w in weights)))

That would be len(weights)

> There were two mistakes, both related to the definition of combined:
> (1) Combined only holds positive values, so testing in sum is silly.

but I think your first version was correct; I don't see were you remove the
0-items from weights.

Peter

Peter Otten, May 31, 2009