How do I sample randomly based on some probability(wightage)?

Discussion in 'Python' started by Sumitava Mukherjee, May 26, 2009.

  1. Hi all,
    I need to randomly sample from a list where all choices have weights
    attached to them. The probability of them being choosen is dependent
    on the weights.
    If say Sample list of choices are [A,B,C,D,E] and weights of the same
    are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
    want the likeliness of them being chosen be in the order : D>A>C>E>B

    In short I mean if prob of a H is .9 and probability of T be 0.1 then
    if I draw 10 samples, 9 should be H and 1 should be T.

    I coudn't find a function in the module random that does so.
    Please can someone guide me how the above could be implemented [either
    through some function which exists and I don't know or pointers to
    some code snippets which does so]?
     
    Sumitava Mukherjee, May 26, 2009
    #1
    1. Advertising

  2. On May 26, 11:39 pm, Sumitava Mukherjee <> wrote:
    > Hi all,
    > I need to randomly sample from a list where all choices have weights
    > attached to them. The probability of them being choosen is dependent
    > on the weights.
    > If say Sample list of choices are [A,B,C,D,E] and weights of the same
    > are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
    > want the likeliness of them being chosen be in the order : D>A>C>E>B
    >
    > In short I mean if prob of a H is .9 and probability of T be 0.1 then
    > if I draw 10 samples, 9 should be H and 1 should be T.
    >
    > I coudn't find a function in the module random that does so.
    > Please can someone guide me how the above could be implemented [either
    > through some function which exists and I don't know or pointers to
    > some code snippets which does so]?


    >>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]
     
    Sumitava Mukherjee, May 26, 2009
    #2
    1. Advertising

  3. On 5/26/2009 11:39 AM Sumitava Mukherjee said...
    > Hi all,
    > I need to randomly sample from a list where all choices have weights
    > attached to them. The probability of them being choosen is dependent
    > on the weights.
    > If say Sample list of choices are [A,B,C,D,E] and weights of the same
    > are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
    > want the likeliness of them being chosen be in the order : D>A>C>E>B
    >
    > In short I mean if prob of a H is .9 and probability of T be 0.1 then
    > if I draw 10 samples, 9 should be H and 1 should be T.


    I don't know if there's a function for this somewhere, but you can
    easily roll your own...

    import random

    choices = list('ABCDE')
    weights = [0.895,0.567,0.765,0.890,0.60]

    selections = list("".join([ choice*int(weight*1000) for choice,weight in
    zip(choices,weights) ]))

    random.shuffle(selections)

    for randomchoice in selections:
    dosomething(randomchoice)

    Emile



    >
    > I coudn't find a function in the module random that does so.
    > Please can someone guide me how the above could be implemented [either
    > through some function which exists and I don't know or pointers to
    > some code snippets which does so]?
     
    Emile van Sebille, May 26, 2009
    #3
  4. Sumitava Mukherjee <> writes:

    > On May 26, 11:39 pm, Sumitava Mukherjee <> wrote:
    >> Hi all,
    >> I need to randomly sample from a list where all choices have weights
    >> attached to them. The probability of them being choosen is dependent
    >> on the weights.
    >> If say Sample list of choices are [A,B,C,D,E] and weights of the same
    >> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
    >> want the likeliness of them being chosen be in the order : D>A>C>E>B


    You mean A > D > C > E > B

    >> In short I mean if prob of a H is .9 and probability of T be 0.1 then
    >> if I draw 10 samples, 9 should be H and 1 should be T.
    >>
    >> I coudn't find a function in the module random that does so.
    >> Please can someone guide me how the above could be implemented [either
    >> through some function which exists and I don't know or pointers to
    >> some code snippets which does so]?

    >
    >>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]


    If you do sampling without replacement, you need to know the exact
    number of each of A, B, C, D, E in the sample, not just their relative
    frequency.

    --
    Arnaud
     
    Arnaud Delobelle, May 26, 2009
    #4
  5. Sumitava Mukherjee

    Mel Guest

    Sumitava Mukherjee wrote:

    > Hi all,
    > I need to randomly sample from a list where all choices have weights
    > attached to them. The probability of them being choosen is dependent
    > on the weights.
    > If say Sample list of choices are [A,B,C,D,E] and weights of the same
    > are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
    > want the likeliness of them being chosen be in the order : D>A>C>E>B
    >
    > In short I mean if prob of a H is .9 and probability of T be 0.1 then
    > if I draw 10 samples, 9 should be H and 1 should be T.
    >
    > I coudn't find a function in the module random that does so.
    > Please can someone guide me how the above could be implemented [either
    > through some function which exists and I don't know or pointers to
    > some code snippets which does so]?


    I've usually used something like (untested):

    def chooser (objects, weights):
    total = 0.0
    for obj, weight in zip (objects, weights):
    total += weight
    if weight < total * random.random():
    chosen = obj
    return chosen

    Works fine if runtime is not a great concern.

    Mel.
     
    Mel, May 26, 2009
    #5
  6. On May 26, 2:39 pm, Sumitava Mukherjee <> wrote:
    > Hi all,
    > I need to randomly sample from a list where all choices have weights
    > attached to them. The probability of them being choosen is dependent
    > on the weights.
    > If say Sample list of choices are [A,B,C,D,E] and weights of the same
    > are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
    > want the likeliness of them being chosen be in the order : D>A>C>E>B
    >
    > In short I mean if prob of a H is .9 and probability of T be 0.1 then
    > if I draw 10 samples, 9 should be H and 1 should be T.
    >
    > I coudn't find a function in the module random that does so.
    > Please can someone guide me how the above could be implemented [either
    > through some function which exists and I don't know or pointers to
    > some code snippets which does so]?


    1. Normalize the weights so that they sum up to 1.
    2. Form the cumulative sequence S = [0, w0, w0+w1, w0+w1+w2, ..., sum
    (w)==1.0]
    3. Call random.random() -> x
    4. Find the bucket in S that x belongs to, i.e. the i so that s <=
    x < s[i+1] (you can do this fast using the bisect module).
    5. Return choice
    6. Test.
    7. Submit homework ;)

    HTH,
    George
     
    George Sakkis, May 26, 2009
    #6
  7. Sumitava Mukherjee

    Guest

    , May 26, 2009
    #7
  8. Sumitava Mukherjee

    Dave Angel Guest

    Sumitava Mukherjee wrote:
    > Hi all,
    > I need to randomly sample from a list where all choices have weights
    > attached to them. The probability of them being choosen is dependent
    > on the weights.
    > If say Sample list of choices are [A,B,C,D,E] and weights of the same
    > are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
    > want the likeliness of them being chosen be in the order : D>A>C>E>B
    >
    > In short I mean if prob of a H is .9 and probability of T be 0.1 then
    > if I draw 10 samples, 9 should be H and 1 should be T.
    >
    > I coudn't find a function in the module random that does so.
    > Please can someone guide me how the above could be implemented [either
    > through some function which exists and I don't know or pointers to
    > some code snippets which does so]?
    >
    >

    Emile gave you a pretty elegant solution, assuming two things: The
    choices can be represented by a single character each, and a roundoff
    error of one part in a thousand is acceptable.

    Of course, you can represent 256 choices in a 2.x character, and you
    could switch to Unicode if that's not enough. And if 1000 isn't enough,
    you could make it bigger, at the cost of needing a bigger table.

    Here's my more-straightforward algorithm, not reduced to code.

    Start with a list of probabilities. Replace each item with the sum of
    all the items up to and including itself. (If you do it in-place, you'd
    need to do it in reverse order, but if you use a copy, you could simply
    replace each item with a sum() of the appropriate slice of the copy.


    Anyway, now you've got a list of cumulative probabilites, with the last
    item equalling 1.0 (pretty close). You might need to fudge that to
    exactly 1.0, just in case.

    I think I'd zip that list with the original list of items, so that you
    have a single list of tuples.

    Now to generate a random sample, call random.random() to get a value
    between 0 and 1. Search the list for the first item greater than or
    equal to your value, and return the other half of the tuple.
     
    Dave Angel, May 26, 2009
    #8
  9. On Tue, 26 May 2009 11:39:36 -0700 (PDT), Sumitava Mukherjee
    <> declaimed the following in
    gmane.comp.python.general:

    > Hi all,
    > I need to randomly sample from a list where all choices have weights
    > attached to them. The probability of them being choosen is dependent
    > on the weights.
    > If say Sample list of choices are [A,B,C,D,E] and weights of the same
    > are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
    > want the likeliness of them being chosen be in the order : D>A>C>E>B
    >

    Well... based on your numbers that should be A>D...

    Probably going to be blasted for doing homework, but I was curious
    what was in the libraries for the task...


    -=-=-=-=-=-=-
    import random
    import bisect

    choices = choices = [ "A", "B", "C", "D", "E" ]
    weights = [ 0.895, 0.567, 0.765, 0.890, 0.60 ]

    normweights = []
    normsum = sum(weights)
    normweights.append(weights[0] / normsum)
    for i in range(1, len(weights)):
    normweights.append((weights / normsum) + normweights[i-1])

    for i in range(50):
    r = random.random()
    print choices[bisect.bisect(normweights, r)],

    -=-=-=-=-=-=-=-
    >pythonw -u "Script11.py"

    C D D B E E D A A C D A C D A B C C E B A A C D A C C D B D D A D B A A
    C A D A B C B A D D A D C C
    >Exit code: 0
    >pythonw -u "Script11.py"

    C C A D A E D A E C B A D B C D C A C C E E E E D A D B D A E E D C B E
    D A C E C B C C D B E D C E
    >Exit code: 0

    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, May 27, 2009
    #9
  10. On Tue, May 26, 2009 at 8:42 PM, Sumitava Mukherjee
    <> wrote:
    > On May 26, 11:39 pm, Sumitava Mukherjee <> wrote:
    >
    >>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]


    That means that, you'll have to code something that updates the sums
    of probabilities after each extraction...

    Would there be a smart way of doing this, of just adding the whole
    updated list again is the best, or at least good enough?

    --
    (\__/)
    ( o_O)
    ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
    planes de dominación mundial.
     
    Jaime Fernandez del Rio, May 27, 2009
    #10
  11. On Wed, 27 May 2009 08:08:53 -0700, Scott David Daniels
    <> declaimed the following in
    gmane.comp.python.general:

    > I am not sure why everybody is normalizing by dividing the weights.


    Because it allows direct use of the value from random.random --
    rather than slowing things down later by having to scale the result for
    EACH sample drawn.

    > spot = sample = random.random() * total

    --
    Wulfraed Dennis Lee Bieber KD6MOG

    HTTP://wlfraed.home.netcom.com/
    (Bestiaria Support Staff: )
    HTTP://www.bestiaria.com/
     
    Dennis Lee Bieber, May 28, 2009
    #11
  12. Op 2009-05-26, Arnaud Delobelle schreef <>:
    > Sumitava Mukherjee <> writes:
    >
    >> On May 26, 11:39 pm, Sumitava Mukherjee <> wrote:
    >>> Hi all,
    >>> I need to randomly sample from a list where all choices have weights
    >>> attached to them. The probability of them being choosen is dependent
    >>> on the weights.
    >>> If say Sample list of choices are [A,B,C,D,E] and weights of the same
    >>> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
    >>> want the likeliness of them being chosen be in the order : D>A>C>E>B

    >
    > You mean A > D > C > E > B
    >
    >>> In short I mean if prob of a H is .9 and probability of T be 0.1 then
    >>> if I draw 10 samples, 9 should be H and 1 should be T.
    >>>
    >>> I coudn't find a function in the module random that does so.
    >>> Please can someone guide me how the above could be implemented [either
    >>> through some function which exists and I don't know or pointers to
    >>> some code snippets which does so]?

    >>
    >>>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]

    >
    > If you do sampling without replacement, you need to know the exact
    > number of each of A, B, C, D, E in the sample, not just their relative
    > frequency.


    As far as I understand, you are given the exact number of each. It is one.
    The numbers given are not relative frequencies of appearance but weights
    to be attributed for picking them.

    --
    Antoon Pardon
     
    Antoon Pardon, May 28, 2009
    #12
  13. On May 28, 3:52 pm, Antoon Pardon <> wrote:
    > Op 2009-05-26, Arnaud Delobelle schreef <>:
    >
    >
    >
    > > Sumitava Mukherjee <> writes:

    >
    > >> On May 26, 11:39 pm, Sumitava Mukherjee <> wrote:
    > >>> Hi all,
    > >>> I need to randomly sample from a list where all choices have weights
    > >>> attached to them. The probability of them being choosen is dependent
    > >>> on the weights.
    > >>> If say Sample list of choices are [A,B,C,D,E] and weights of the same
    > >>> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
    > >>> want the likeliness of them being chosen be in the order : D>A>C>E>B

    >
    > > You mean A > D > C > E > B

    >
    > >>> In short I mean if prob of a H is .9 and probability of T be 0.1 then
    > >>> if I draw 10 samples, 9 should be H and 1 should be T.

    >
    > >>> I coudn't find a function in the module random that does so.
    > >>> Please can someone guide me how the above could be implemented [either
    > >>> through some function which exists and I don't know or pointers to
    > >>> some code snippets which does so]?

    >
    > >>>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]

    >
    > > If you do sampling without replacement, you need to know the exact
    > > number of each of A, B, C, D, E in the sample, not just their relative
    > > frequency.

    >
    > As far as I understand, you are given the exact number of each. It is one..
    > The numbers given are not relative frequencies of appearance but weights
    > to be attributed for picking them.
    >
    > --
    > Antoon Pardon


    Yes, the numbers are weights attributed for picking them.
     
    Sumitava Mukherjee, May 31, 2009
    #13
  14. On May 27, 8:08 pm, Scott David Daniels <> wrote:
    > Sumitava Mukherjee wrote:
    > > I need to randomly sample from a list where all choices have weights
    > > attached to them. The probability of them being choosen is dependent
    > > on the weights.

    >
    > I am not sure why everybody is normalizing by dividing the weights.
    > This isa possibility (I had fun writing it).
    >
    > def sample_without_replacement(choices, weights):
    >      '''Yield elements sampled w/o replacement by weighting'''
    >      if len(weights) != len(choices):
    >          raise ValueError('%d choices, but %d weights?' % (
    >              len(choices), len(weights)))
    >      if min(weights) < 0:
    >          raise ValueError('Negative weights?: %s' % (
    >                      [(i, w) for i, w in enumerate(weights) if w < 0]))
    >
    >      # look at only non-zero probabilities
    >      combined = [(w, v) for w, v in zip(weights, choices) if w > 0]
    >
    >      # Go from highest probability down to reduce expected traversal
    >      combined.sort(key=operator.itemgetter(0), reverse=True)
    >
    >      total = sum(w for w, v in combined) # sum(weights) also works
    >      while combined:
    >          spot = sample = random.random() * total
    >          for n, (weight, choice) in enumerate(combined):
    >              spot -= weight
    >              if spot <= 0:
    >                  break
    >          else:
    >              # n, (weight, choice) = 0, combined[0] # Highest probability
    >              raise ValueError('%f left after choosing %f/%f?: %s' % (
    >                                  spot, sample, total, combined))
    >          yield choice
    >          total -= weight
    >          if weight > total * 256: # arbitrary choice for recalculation
    >              # precision affected, rebuild
    >              total = sum(w for w, v in combined)
    >          del combined[n]
    >      raise ValueError('Samplng more than %d without replacement?' % (
    >                        sum(1 for w in weights if w > 0)))
    >
    > for n in range(10):
    >      gen = sample_without_replacement('abcdef', [32,16,8,4,2,1])
    >      print gen.next(), gen.next(), gen.next()
    >
    > --Scott David Daniels
    >


    Among all the help (all of which I really appreciate), I found your
    solution closest to what I was expecting. Thanks a lot Scott.
     
    Sumitava Mukherjee, May 31, 2009
    #14
  15. Sumitava Mukherjee

    Peter Otten Guest

    Scott David Daniels wrote:

    > raise ValueError('Samplng more than %d without replacement?' % (
    > sum(1 for w in weights)))


    That would be len(weights)

    > There were two mistakes, both related to the definition of combined:
    > (1) Combined only holds positive values, so testing in sum is silly.


    but I think your first version was correct; I don't see were you remove the
    0-items from weights.

    Peter
     
    Peter Otten, May 31, 2009
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jake
    Replies:
    0
    Views:
    562
  2. southof40
    Replies:
    12
    Views:
    740
    southof40
    Jun 21, 2010
  3. Cameron Simpson
    Replies:
    3
    Views:
    220
    southof40
    Jun 21, 2010
  4. Tim Chase
    Replies:
    0
    Views:
    76
    Tim Chase
    Feb 16, 2014
  5. Terry Reedy
    Replies:
    0
    Views:
    84
    Terry Reedy
    Feb 16, 2014
Loading...

Share This Page