# Skewed Random Number

Discussion in 'Java' started by shuchalle@hotmail.com, Sep 19, 2005.

1. ### Guest

Hello,

I am writing a program in Java. I have following requirements.

We have large data set points whose value will range from 100 to 1500.

We need to select 10% of dataset points randomly. So if there were
40000 data points - we need to select 4000 points on random basis.

Now you say - well that's easy. Well - here is the twist.

We need to "skew" the randomness so that more points are selected
towards higher number as in near to 1500 and less points are selected
toward lower end of spectrum that is 100. But all in all -still 10% (or
4000 out of 40000 dataset points) of total points out of data points
should be selected.

We can use some sort of "logarithmic skewage" - if there is such a
word.

Any clever ideas or hints would be much appreciated!

Regards,

AZXML

, Sep 19, 2005

2. ### Oliver WongGuest

<> wrote in message
news:...
> Hello,
>
> I am writing a program in Java. I have following requirements.
>
> We have large data set points whose value will range from 100 to 1500.
>
> We need to select 10% of dataset points randomly. So if there were
> 40000 data points - we need to select 4000 points on random basis.
>
> Now you say - well that's easy. Well - here is the twist.
>
> We need to "skew" the randomness so that more points are selected
> towards higher number as in near to 1500 and less points are selected
> toward lower end of spectrum that is 100. But all in all -still 10% (or
> 4000 out of 40000 dataset points) of total points out of data points
> should be selected.
>
> We can use some sort of "logarithmic skewage" - if there is such a
> word.
>
> Any clever ideas or hints would be much appreciated!

Umm... what's the problem exactly? You seem to be under the assumption
that all random distributions are uniform; that's not the case.

I don't know what kind of distribution you want, but Poisson
distribution, Beta distribution with A=1;B=3, Gamma distribution with
(k=1;theta=2), exponential distrubiton, and many others all have the
property that one end of the spectrum is more likely than others.

Why don't you take a look at
http://en.wikipedia.org/wiki/Category:Continuous_distributions

- Oliver

Oliver Wong, Sep 19, 2005

3. ### Thomas FritschGuest

<> wrote:
> We have large data set points whose value will range from 100 to 1500.
>
> We need to select 10% of dataset points randomly. So if there were
> 40000 data points - we need to select 4000 points on random basis.
>
> Now you say - well that's easy. Well - here is the twist.
>
> We need to "skew" the randomness so that more points are selected
> towards higher number as in near to 1500 and less points are selected
> toward lower end of spectrum that is 100. But all in all -still 10% (or
> 4000 out of 40000 dataset points) of total points out of data points
> should be selected.
>
> We can use some sort of "logarithmic skewage" - if there is such a
> word.
>
> Any clever ideas or hints would be much appreciated!

A simple method for generating a random number, which favors large values a
bit, could be:
double x = Math.random(); // uniform distributed in [0,1]
x = Math.pow(x, 0.9); // skewed distributed in [0,1]
x = 1400 * x + 100; // skewed distributed in [100,1500]

--
"TFritsch\$t-online:de".replace(':','.').replace('\$','@')

Thomas Fritsch, Sep 19, 2005
4. ### Roedy GreenGuest

On 19 Sep 2005 08:56:11 -0700, wrote or quoted :

>Any clever ideas or hints would be much appreciated!

you need a course in elementary probability and statistics.

Here are some hints.

See http://mindprod.com/jgloss/randomnumbers.htmls

here is how nextGaussian works to produce a normal bell shaped curve
distribution:

synchronized public double nextGaussian() {
if (haveNextNextGaussian) {
haveNextNextGaussian = false;
return nextNextGaussian;
} else {
double v1, v2, s;
do {
v1 = 2 * nextDouble() - 1; // between -1.0 and
1.0
v2 = 2 * nextDouble() - 1; // between -1.0 and
1.0
s = v1 * v1 + v2 * v2;
} while (s >= 1 || s == 0);
double multiplier = Math.sqrt(-2 * Math.log(s)/s);
nextNextGaussian = v2 * multiplier;
haveNextNextGaussian = true;
return v1 * multiplier;
}
}

It works by taking two random doubles.

Another common distribution is called Poisson.

You need to be more precise about just how the elements are skewed
more toward the high end before you can come up with a formula to skew
them.

Here is the general idea of how you can do this.

1. scale your random number 0..1 over a more interesting domain of a
function with a simple multiplication.

2. crank it through some non-linear formula, e.g. x squared, sqrt,
exp, log, log base n, x^n, a polynomial, a chebychev polynomial,
parabola,... doing this to exp(x) for example will result in points
being dense at the low end and sparse at the high end.

3. scale it back into suitable range with a multiplication.

Different formulae will give you different skewings. If you don't
have a particular mathematical model you need, just pick a formula
that satisfies you intuitively. Graph the function and the
distribution.
--