Is there a better way to simulate randomly choosing from weighted set

Y

yay_frogs

Here is my problem: suppose there are, say, five events with these
probabilities:

event1 0.7
event2 0.1
event3 0.1
event4 0.05
event5 0.05

Note that sum of the probabilities is 1.0. I would like a function that
simulates these events and returns an int to indicate which event
occurred: the function should statistically return 1 about 70% of the
time, 2 about 10% of the time, and so on.

I have figured out a way to do this, but I suspect my way is
suboptimal.

I build a vector of five elements that looks like:

{ 0.05, 0.05+0.05, 0.05+0.05+0.1, 0.05+0.05+0.1+0.1,
0.05+0.05+0.1+0.1+0.7 }
= { 0.05, 0.1, 0.2, 0.3, 1.0 }

I then generate a random float in the interval 0.0 ... 1.0, and if the
random float is in the range 0 to 0.05, I return event 5, and if the
random float is in the range 0.05-0.1, I return event 4, and so on.
(Actually, I should test for event 1 first since it is most common, but
I'm too lazy to re-type my example vector above.)

For my real problem, I have to deal with many different cases where the
number of events to consider constantly varies, and I suspect there has
to be a better way than building a vector to represent the different
ranges a random variable can fall in and then seeing which range it
falls in.

So is there a better way?
 
M

Maxim Yegorushkin

Here is my problem: suppose there are, say, five events with these
probabilities:

event1 0.7
event2 0.1
event3 0.1
event4 0.05
event5 0.05

Note that sum of the probabilities is 1.0. I would like a function that
simulates these events and returns an int to indicate which event
occurred: the function should statistically return 1 about 70% of the
time, 2 about 10% of the time, and so on.

I have figured out a way to do this, but I suspect my way is
suboptimal.

I build a vector of five elements that looks like:

{ 0.05, 0.05+0.05, 0.05+0.05+0.1, 0.05+0.05+0.1+0.1,
0.05+0.05+0.1+0.1+0.7 }
= { 0.05, 0.1, 0.2, 0.3, 1.0 }

I then generate a random float in the interval 0.0 ... 1.0, and if the
random float is in the range 0 to 0.05, I return event 5, and if the
random float is in the range 0.05-0.1, I return event 4, and so on.
(Actually, I should test for event 1 first since it is most common, but
I'm too lazy to re-type my example vector above.)

For my real problem, I have to deal with many different cases where the
number of events to consider constantly varies, and I suspect there has
to be a better way than building a vector to represent the different
ranges a random variable can fall in and then seeing which range it
falls in.

You don't have to precompute the vector at all. Just modify your
algorithm to compute the ranges on the fly. Like this:

#include <stdio.h>
#include <stdlib.h>

int event(double const* prob, unsigned prob_len)
{
unsigned r = rand();
double p = 0;
for(unsigned i = 0; p < 1 && i < prob_len; ++i)
{
p += prob;
if(r < p * RAND_MAX)
return i;
}
return -1; // should not get here
}

int main()
{
double const prob[] = { .7, .1, .1, .05, .05 };
for(unsigned n = 1000000; n--;)
printf("%d\n", event(prob, sizeof(prob) / sizeof(*prob)));
}

$ ./exp | awk '/0/{++n0} /1/{++n1} /2/{++n2} /3/{++n3} /4/{++n4} END {
print n0, n1, n2, n3, n4 }'
700612 99770 99752 50075 49791
 
K

Kai-Uwe Bux

Here is my problem: suppose there are, say, five events with these
probabilities:

event1 0.7
event2 0.1
event3 0.1
event4 0.05
event5 0.05

Note that sum of the probabilities is 1.0. I would like a function that
simulates these events and returns an int to indicate which event
occurred: the function should statistically return 1 about 70% of the
time, 2 about 10% of the time, and so on.
[details of a solution snipped]
For my real problem, I have to deal with many different cases where the
number of events to consider constantly varies, and I suspect there has
to be a better way than building a vector to represent the different
ranges a random variable can fall in and then seeing which range it
falls in.

So is there a better way?

a) The Alias Method of Walker (Google it or find it is TAOCP). This choice
is good if you have to draw many times from the same probability set.

b) Google the archive of this news group for Anglewyrm's hat container. This
solution is good when the probabilities change frequently.


Best

Kai-Uwe Bux
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top