Is there a better way to simulate randomly choosing from weighted set

yay_frogs · Feb 27, 2006

Here is my problem: suppose there are, say, five events with these
probabilities:

event1 0.7
event2 0.1
event3 0.1
event4 0.05
event5 0.05

Note that sum of the probabilities is 1.0. I would like a function that
simulates these events and returns an int to indicate which event
occurred: the function should statistically return 1 about 70% of the
time, 2 about 10% of the time, and so on.

I have figured out a way to do this, but I suspect my way is
suboptimal.

I build a vector of five elements that looks like:

{ 0.05, 0.05+0.05, 0.05+0.05+0.1, 0.05+0.05+0.1+0.1,
0.05+0.05+0.1+0.1+0.7 }
= { 0.05, 0.1, 0.2, 0.3, 1.0 }

I then generate a random float in the interval 0.0 ... 1.0, and if the
random float is in the range 0 to 0.05, I return event 5, and if the
random float is in the range 0.05-0.1, I return event 4, and so on.
(Actually, I should test for event 1 first since it is most common, but
I'm too lazy to re-type my example vector above.)

For my real problem, I have to deal with many different cases where the
number of events to consider constantly varies, and I suspect there has
to be a better way than building a vector to represent the different
ranges a random variable can fall in and then seeing which range it
falls in.

So is there a better way?

Maxim Yegorushkin · Feb 27, 2006

Here is my problem: suppose there are, say, five events with these
probabilities:

event1 0.7
event2 0.1
event3 0.1
event4 0.05
event5 0.05

Note that sum of the probabilities is 1.0. I would like a function that
simulates these events and returns an int to indicate which event
occurred: the function should statistically return 1 about 70% of the
time, 2 about 10% of the time, and so on.

I have figured out a way to do this, but I suspect my way is
suboptimal.

I build a vector of five elements that looks like:

{ 0.05, 0.05+0.05, 0.05+0.05+0.1, 0.05+0.05+0.1+0.1,
0.05+0.05+0.1+0.1+0.7 }
= { 0.05, 0.1, 0.2, 0.3, 1.0 }

I then generate a random float in the interval 0.0 ... 1.0, and if the
random float is in the range 0 to 0.05, I return event 5, and if the
random float is in the range 0.05-0.1, I return event 4, and so on.
(Actually, I should test for event 1 first since it is most common, but
I'm too lazy to re-type my example vector above.)

For my real problem, I have to deal with many different cases where the
number of events to consider constantly varies, and I suspect there has
to be a better way than building a vector to represent the different
ranges a random variable can fall in and then seeing which range it
falls in.

You don't have to precompute the vector at all. Just modify your
algorithm to compute the ranges on the fly. Like this:

#include <stdio.h>
#include <stdlib.h>

int event(double const* prob, unsigned prob_len)
{
unsigned r = rand();
double p = 0;
for(unsigned i = 0; p < 1 && i < prob_len; ++i)
{
p += prob;
if(r < p * RAND_MAX)
return i;
}
return -1; // should not get here
}

int main()
{
double const prob[] = { .7, .1, .1, .05, .05 };
for(unsigned n = 1000000; n--
printf("%d\n", event(prob, sizeof(prob) / sizeof(*prob)));
}

$ ./exp | awk '/0/{++n0} /1/{++n1} /2/{++n2} /3/{++n3} /4/{++n4} END {
print n0, n1, n2, n3, n4 }'
700612 99770 99752 50075 49791

Kai-Uwe Bux · Feb 27, 2006

Here is my problem: suppose there are, say, five events with these
probabilities:

event1 0.7
event2 0.1
event3 0.1
event4 0.05
event5 0.05

Note that sum of the probabilities is 1.0. I would like a function that
simulates these events and returns an int to indicate which event
occurred: the function should statistically return 1 about 70% of the
time, 2 about 10% of the time, and so on.

[details of a solution snipped]

For my real problem, I have to deal with many different cases where the
number of events to consider constantly varies, and I suspect there has
to be a better way than building a vector to represent the different
ranges a random variable can fall in and then seeing which range it
falls in.

So is there a better way?

a) The Alias Method of Walker (Google it or find it is TAOCP). This choice
is good if you have to draw many times from the same probability set.

b) Google the archive of this news group for Anglewyrm's hat container. This
solution is good when the probabilities change frequently.

Best

Kai-Uwe Bux

Most pythonic way of weighted random selection	3	Aug 30, 2008
Is there a way to pass this state from component to the fetch?	1	Apr 24, 2023
Is there a way to input a unique number for each array output?	4	Aug 31, 2022
Is there a better book to learn javascript?	2	Nov 24, 2019
A Better Container Choice?	3	Aug 22, 2013
Is there a way to get a single mode using all the points within a 2D array?	2	Oct 17, 2022
Is there a better way to do this?	8	Jun 19, 2008
Better way to specify return type in obscure case involving VisualC++ bug?	0	Nov 12, 2013

Is there a better way to simulate randomly choosing from weighted set

yay_frogs

Maxim Yegorushkin

Kai-Uwe Bux

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads