Skewed Random Number

S

shuchalle

Hello,

I am writing a program in Java. I have following requirements.

We have large data set points whose value will range from 100 to 1500.

We need to select 10% of dataset points randomly. So if there were
40000 data points - we need to select 4000 points on random basis.

Now you say - well that's easy. Well - here is the twist.

We need to "skew" the randomness so that more points are selected
towards higher number as in near to 1500 and less points are selected
toward lower end of spectrum that is 100. But all in all -still 10% (or
4000 out of 40000 dataset points) of total points out of data points
should be selected.

We can use some sort of "logarithmic skewage" - if there is such a
word.

Any clever ideas or hints would be much appreciated!

Regards,

AZXML
 
O

Oliver Wong

Hello,

I am writing a program in Java. I have following requirements.

We have large data set points whose value will range from 100 to 1500.

We need to select 10% of dataset points randomly. So if there were
40000 data points - we need to select 4000 points on random basis.

Now you say - well that's easy. Well - here is the twist.

We need to "skew" the randomness so that more points are selected
towards higher number as in near to 1500 and less points are selected
toward lower end of spectrum that is 100. But all in all -still 10% (or
4000 out of 40000 dataset points) of total points out of data points
should be selected.

We can use some sort of "logarithmic skewage" - if there is such a
word.

Any clever ideas or hints would be much appreciated!

Umm... what's the problem exactly? You seem to be under the assumption
that all random distributions are uniform; that's not the case.

I don't know what kind of distribution you want, but Poisson
distribution, Beta distribution with A=1;B=3, Gamma distribution with
(k=1;theta=2), exponential distrubiton, and many others all have the
property that one end of the spectrum is more likely than others.

Why don't you take a look at
http://en.wikipedia.org/wiki/Category:Continuous_distributions

- Oliver
 
T

Thomas Fritsch

We have large data set points whose value will range from 100 to 1500.

We need to select 10% of dataset points randomly. So if there were
40000 data points - we need to select 4000 points on random basis.

Now you say - well that's easy. Well - here is the twist.

We need to "skew" the randomness so that more points are selected
towards higher number as in near to 1500 and less points are selected
toward lower end of spectrum that is 100. But all in all -still 10% (or
4000 out of 40000 dataset points) of total points out of data points
should be selected.

We can use some sort of "logarithmic skewage" - if there is such a
word.

Any clever ideas or hints would be much appreciated!
A simple method for generating a random number, which favors large values a
bit, could be:
double x = Math.random(); // uniform distributed in [0,1]
x = Math.pow(x, 0.9); // skewed distributed in [0,1]
x = 1400 * x + 100; // skewed distributed in [100,1500]
 
R

Roedy Green

Any clever ideas or hints would be much appreciated!

you need a course in elementary probability and statistics.

Here are some hints.

See http://mindprod.com/jgloss/randomnumbers.htmls

here is how nextGaussian works to produce a normal bell shaped curve
distribution:

synchronized public double nextGaussian() {
if (haveNextNextGaussian) {
haveNextNextGaussian = false;
return nextNextGaussian;
} else {
double v1, v2, s;
do {
v1 = 2 * nextDouble() - 1; // between -1.0 and
1.0
v2 = 2 * nextDouble() - 1; // between -1.0 and
1.0
s = v1 * v1 + v2 * v2;
} while (s >= 1 || s == 0);
double multiplier = Math.sqrt(-2 * Math.log(s)/s);
nextNextGaussian = v2 * multiplier;
haveNextNextGaussian = true;
return v1 * multiplier;
}
}

It works by taking two random doubles.

Another common distribution is called Poisson.

You need to be more precise about just how the elements are skewed
more toward the high end before you can come up with a formula to skew
them.

Here is the general idea of how you can do this.

1. scale your random number 0..1 over a more interesting domain of a
function with a simple multiplication.

2. crank it through some non-linear formula, e.g. x squared, sqrt,
exp, log, log base n, x^n, a polynomial, a chebychev polynomial,
parabola,... doing this to exp(x) for example will result in points
being dense at the low end and sparse at the high end.

3. scale it back into suitable range with a multiplication.

Different formulae will give you different skewings. If you don't
have a particular mathematical model you need, just pick a formula
that satisfies you intuitively. Graph the function and the
distribution.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top