choosing random item from set with weighted probability

D

Digital Puer

I have an algorithmic question (not necessarily tied to Java).

Suppose I have the text from a newspaper and want to choose
a letter from the text with proportional probability. For example,
if the letter 'e' occurs, say, 50 times out of a text of 150
characters, then 'e' will be chosen with probability 1/3.

What is the best way to code this? I could set up an if-elseif ladder
like this:

x = random number [1 to total number of letters]
if (1 < x <= 10) /* 10, for example, because 'a' occurred 10 times */
choose 'a'
else if (10 < x <= 14) /* because 'b' occurred 4 times */
choose 'b'

This seems extremely tedious. Is there a better way?
 
C

Chris Smith

Digital Puer said:
I have an algorithmic question (not necessarily tied to Java).

Suppose I have the text from a newspaper and want to choose
a letter from the text with proportional probability. For example,
if the letter 'e' occurs, say, 50 times out of a text of 150
characters, then 'e' will be chosen with probability 1/3.

This seems extremely tedious. Is there a better way?

I don't really see any clever algorithm to do extremely well from any
objective standpoint. Here's how I would do it off-hand, though, which
is a little easier on the programmer than what you wrote. It also makes
it easier later change things to load the probabilities from a resource
or external file, which would be ideal.

The values in 'probabilities' should add up to one. Because rounding
error makes that hard to do precisely, the code will work for values
less than one as well. Values greater than one will shaft the latter
values, but probably not enough to matter for cases of rounding error.

Here goes:

private static Map<Character, Double> probabilities
= new HashMap<Character, Double>();

static
{
// Initialize probabilities...
probabilities.put('a', 1.0/6.0);
probabilities.put('b', 1.0/6.0);
probabilities.put('c', 1.0/6.0);
probabilities.put('d', 1.0/6.0);
probabilities.put('e', 1.0/3.0);
}

public static char getRandomChar()
{
/* Loop causes a retry in case rounding error messes thing up. */
while (true)
{
double p = Math.random();

for (Map.Entry<Character, Value> entry : probabilities)
{
if (p < entry.getValue()) return entry.getKey();
else p -= entry.getValue();
}
}
}

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
M

Matt Humphrey

Chris Smith said:
I don't really see any clever algorithm to do extremely well from any
objective standpoint. Here's how I would do it off-hand, though, which
is a little easier on the programmer than what you wrote. It also makes
it easier later change things to load the probabilities from a resource
or external file, which would be ideal.

The values in 'probabilities' should add up to one. Because rounding
error makes that hard to do precisely, the code will work for values
less than one as well. Values greater than one will shaft the latter
values, but probably not enough to matter for cases of rounding error.

Here goes:

private static Map<Character, Double> probabilities
= new HashMap<Character, Double>();

static
{
// Initialize probabilities...
probabilities.put('a', 1.0/6.0);
probabilities.put('b', 1.0/6.0);
probabilities.put('c', 1.0/6.0);
probabilities.put('d', 1.0/6.0);
probabilities.put('e', 1.0/3.0);
}

public static char getRandomChar()
{
/* Loop causes a retry in case rounding error messes thing up. */
while (true)
{
double p = Math.random();

for (Map.Entry<Character, Value> entry : probabilities)
{
if (p < entry.getValue()) return entry.getKey();
else p -= entry.getValue();
}
}
}

You can remove the rounding problem by mapping from the character directly
to the integral counts and selecting the probability value from 0 to the
total count - 1. Because the sum totals the range of p, the final Map.Entry
will always be chosen if one is not chosen sooner. This also makes
accumulating the counts easy--just keep adding them up along with a running
total.

Cheers,
Matt Humphrey (e-mail address removed) http://www.iviz.com/
 
D

David Segall

I have an algorithmic question (not necessarily tied to Java).

Suppose I have the text from a newspaper and want to choose
a letter from the text with proportional probability. For example,
if the letter 'e' occurs, say, 50 times out of a text of 150
characters, then 'e' will be chosen with probability 1/3.
If you actually use the text from the newspaper then if you select a
letter from it at random with equal probability you will already have
a one in three chance of selecting the letter 'e'.
What is the best way to code this? I could set up an if-elseif ladder
like this:

x = random number [1 to total number of letters]
if (1 < x <= 10) /* 10, for example, because 'a' occurred 10 times */
choose 'a'
else if (10 < x <= 14) /* because 'b' occurred 4 times */
choose 'b'

This seems extremely tedious. Is there a better way?
 
J

John B. Matthews

I have an algorithmic question (not necessarily tied to Java).

Suppose I have the text from a newspaper and want to choose
a letter from the text with proportional probability. For example,
if the letter 'e' occurs, say, 50 times out of a text of 150
characters, then 'e' will be chosen with probability 1/3.

Just read the text and choose a letter at random. Given a text of
150 characters with 50 occurrences of 'e', you'll get an 'e' about a
third of the time.

Alternatively, read the text and build a look-up table of letter
frequencies as you go.
 
M

marcus

Well, if I wanted to do a quick and dirty method I would read each
letter into a vector, get a random int limit vector.size(), and wally.
Vallah. wallagh. how do you spell it?

Anyhow, you could initialize your vector any way you wanted, the point
being you have a vector that is a real-world expression of your letter
ratios so getting a proportionally probable output is, ahem,
non-problematic.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top