How do I sample randomly based on some probability(wightage)?

S

Sumitava Mukherjee

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?
 
S

Sumitava Mukherjee

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?
[Oh, I forgot to mention. I am looking for sampling without replacement.]
 
E

Emile van Sebille

On 5/26/2009 11:39 AM Sumitava Mukherjee said...
Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I don't know if there's a function for this somewhere, but you can
easily roll your own...

import random

choices = list('ABCDE')
weights = [0.895,0.567,0.765,0.890,0.60]

selections = list("".join([ choice*int(weight*1000) for choice,weight in
zip(choices,weights) ]))

random.shuffle(selections)

for randomchoice in selections:
dosomething(randomchoice)

Emile


I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?
 
A

Arnaud Delobelle

Sumitava Mukherjee said:
Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

You mean A > D > C > E > B
In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?
[Oh, I forgot to mention. I am looking for sampling without replacement.]

If you do sampling without replacement, you need to know the exact
number of each of A, B, C, D, E in the sample, not just their relative
frequency.
 
M

Mel

Sumitava said:
Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

I've usually used something like (untested):

def chooser (objects, weights):
total = 0.0
for obj, weight in zip (objects, weights):
total += weight
if weight < total * random.random():
chosen = obj
return chosen

Works fine if runtime is not a great concern.

Mel.
 
G

George Sakkis

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

1. Normalize the weights so that they sum up to 1.
2. Form the cumulative sequence S = [0, w0, w0+w1, w0+w1+w2, ..., sum
(w)==1.0]
3. Call random.random() -> x
4. Find the bucket in S that x belongs to, i.e. the i so that s <=
x < s[i+1] (you can do this fast using the bisect module).
5. Return choice
6. Test.
7. Submit homework ;)

HTH,
George
 
D

Dave Angel

Sumitava said:
Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?
Emile gave you a pretty elegant solution, assuming two things: The
choices can be represented by a single character each, and a roundoff
error of one part in a thousand is acceptable.

Of course, you can represent 256 choices in a 2.x character, and you
could switch to Unicode if that's not enough. And if 1000 isn't enough,
you could make it bigger, at the cost of needing a bigger table.

Here's my more-straightforward algorithm, not reduced to code.

Start with a list of probabilities. Replace each item with the sum of
all the items up to and including itself. (If you do it in-place, you'd
need to do it in reverse order, but if you use a copy, you could simply
replace each item with a sum() of the appropriate slice of the copy.


Anyway, now you've got a list of cumulative probabilites, with the last
item equalling 1.0 (pretty close). You might need to fudge that to
exactly 1.0, just in case.

I think I'd zip that list with the original list of items, so that you
have a single list of tuples.

Now to generate a random sample, call random.random() to get a value
between 0 and 1. Search the list for the first item greater than or
equal to your value, and return the other half of the tuple.
 
D

Dennis Lee Bieber

Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B
Well... based on your numbers that should be A>D...

Probably going to be blasted for doing homework, but I was curious
what was in the libraries for the task...


-=-=-=-=-=-=-
import random
import bisect

choices = choices = [ "A", "B", "C", "D", "E" ]
weights = [ 0.895, 0.567, 0.765, 0.890, 0.60 ]

normweights = []
normsum = sum(weights)
normweights.append(weights[0] / normsum)
for i in range(1, len(weights)):
normweights.append((weights / normsum) + normweights[i-1])

for i in range(50):
r = random.random()
print choices[bisect.bisect(normweights, r)],

-=-=-=-=-=-=-=-
pythonw -u "Script11.py"
C D D B E E D A A C D A C D A B C C E B A A C D A C C D B D D A D B A A
C A D A B C B A D D A D C C
Exit code: 0
pythonw -u "Script11.py"
C C A D A E D A E C B A D B C D C A C C E E E E D A D B D A E E D C B E
D A C E C B C C D B E D C E
Exit code: 0
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
J

Jaime Fernandez del Rio

[Oh, I forgot to mention. I am looking for sampling without replacement.]

That means that, you'll have to code something that updates the sums
of probabilities after each extraction...

Would there be a smart way of doing this, of just adding the whole
updated list again is the best, or at least good enough?
 
D

Dennis Lee Bieber

I am not sure why everybody is normalizing by dividing the weights.

Because it allows direct use of the value from random.random --
rather than slowing things down later by having to scale the result for
EACH sample drawn.
spot = sample = random.random() * total
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
A

Antoon Pardon

Op 2009-05-26 said:
Sumitava Mukherjee said:
Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

You mean A > D > C > E > B
In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?
[Oh, I forgot to mention. I am looking for sampling without replacement.]

If you do sampling without replacement, you need to know the exact
number of each of A, B, C, D, E in the sample, not just their relative
frequency.

As far as I understand, you are given the exact number of each. It is one.
The numbers given are not relative frequencies of appearance but weights
to be attributed for picking them.
 
S

Sumitava Mukherjee

Op 2009-05-26, Arnaud Delobelle schreef <[email protected]>:


Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B
You mean A > D > C > E > B
In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.
I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?
[Oh, I forgot to mention. I am looking for sampling without replacement.]
If you do sampling without replacement, you need to know the exact
number of each of A, B, C, D, E in the sample, not just their relative
frequency.

As far as I understand, you are given the exact number of each. It is one..
The numbers given are not relative frequencies of appearance but weights
to be attributed for picking them.

Yes, the numbers are weights attributed for picking them.
 
S

Sumitava Mukherjee

Sumitava said:
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.

I am not sure why everybody is normalizing by dividing the weights.
This isa possibility (I had fun writing it).

def sample_without_replacement(choices, weights):
     '''Yield elements sampled w/o replacement by weighting'''
     if len(weights) != len(choices):
         raise ValueError('%d choices, but %d weights?' % (
             len(choices), len(weights)))
     if min(weights) < 0:
         raise ValueError('Negative weights?: %s' % (
                     [(i, w) for i, w in enumerate(weights) if w < 0]))

     # look at only non-zero probabilities
     combined = [(w, v) for w, v in zip(weights, choices) if w > 0]

     # Go from highest probability down to reduce expected traversal
     combined.sort(key=operator.itemgetter(0), reverse=True)

     total = sum(w for w, v in combined) # sum(weights) also works
     while combined:
         spot = sample = random.random() * total
         for n, (weight, choice) in enumerate(combined):
             spot -= weight
             if spot <= 0:
                 break
         else:
             # n, (weight, choice) = 0, combined[0] # Highest probability
             raise ValueError('%f left after choosing %f/%f?: %s' % (
                                 spot, sample, total, combined))
         yield choice
         total -= weight
         if weight > total * 256: # arbitrary choice for recalculation
             # precision affected, rebuild
             total = sum(w for w, v in combined)
         del combined[n]
     raise ValueError('Samplng more than %d without replacement?' % (
                       sum(1 for w in weights if w > 0)))

for n in range(10):
     gen = sample_without_replacement('abcdef', [32,16,8,4,2,1])
     print gen.next(), gen.next(), gen.next()

--Scott David Daniels
(e-mail address removed)

Among all the help (all of which I really appreciate), I found your
solution closest to what I was expecting. Thanks a lot Scott.
 
P

Peter Otten

Scott said:
raise ValueError('Samplng more than %d without replacement?' % (
sum(1 for w in weights)))

That would be len(weights)
There were two mistakes, both related to the definition of combined:
(1) Combined only holds positive values, so testing in sum is silly.

but I think your first version was correct; I don't see were you remove the
0-items from weights.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,230
Latest member
LifeBoostCBD

Latest Threads

Top