Skewed Random Number

Discussion in 'Java' started by shuchalle@hotmail.com, Sep 19, 2005.

  1. Guest

    Hello,

    I am writing a program in Java. I have following requirements.

    We have large data set points whose value will range from 100 to 1500.

    We need to select 10% of dataset points randomly. So if there were
    40000 data points - we need to select 4000 points on random basis.

    Now you say - well that's easy. Well - here is the twist.

    We need to "skew" the randomness so that more points are selected
    towards higher number as in near to 1500 and less points are selected
    toward lower end of spectrum that is 100. But all in all -still 10% (or
    4000 out of 40000 dataset points) of total points out of data points
    should be selected.

    We can use some sort of "logarithmic skewage" - if there is such a
    word.

    Any clever ideas or hints would be much appreciated!

    Regards,

    AZXML
     
    , Sep 19, 2005
    #1
    1. Advertising

  2. Oliver Wong Guest

    <> wrote in message
    news:...
    > Hello,
    >
    > I am writing a program in Java. I have following requirements.
    >
    > We have large data set points whose value will range from 100 to 1500.
    >
    > We need to select 10% of dataset points randomly. So if there were
    > 40000 data points - we need to select 4000 points on random basis.
    >
    > Now you say - well that's easy. Well - here is the twist.
    >
    > We need to "skew" the randomness so that more points are selected
    > towards higher number as in near to 1500 and less points are selected
    > toward lower end of spectrum that is 100. But all in all -still 10% (or
    > 4000 out of 40000 dataset points) of total points out of data points
    > should be selected.
    >
    > We can use some sort of "logarithmic skewage" - if there is such a
    > word.
    >
    > Any clever ideas or hints would be much appreciated!


    Umm... what's the problem exactly? You seem to be under the assumption
    that all random distributions are uniform; that's not the case.

    I don't know what kind of distribution you want, but Poisson
    distribution, Beta distribution with A=1;B=3, Gamma distribution with
    (k=1;theta=2), exponential distrubiton, and many others all have the
    property that one end of the spectrum is more likely than others.

    Why don't you take a look at
    http://en.wikipedia.org/wiki/Category:Continuous_distributions

    - Oliver
     
    Oliver Wong, Sep 19, 2005
    #2
    1. Advertising

  3. <> wrote:
    > We have large data set points whose value will range from 100 to 1500.
    >
    > We need to select 10% of dataset points randomly. So if there were
    > 40000 data points - we need to select 4000 points on random basis.
    >
    > Now you say - well that's easy. Well - here is the twist.
    >
    > We need to "skew" the randomness so that more points are selected
    > towards higher number as in near to 1500 and less points are selected
    > toward lower end of spectrum that is 100. But all in all -still 10% (or
    > 4000 out of 40000 dataset points) of total points out of data points
    > should be selected.
    >
    > We can use some sort of "logarithmic skewage" - if there is such a
    > word.
    >
    > Any clever ideas or hints would be much appreciated!

    A simple method for generating a random number, which favors large values a
    bit, could be:
    double x = Math.random(); // uniform distributed in [0,1]
    x = Math.pow(x, 0.9); // skewed distributed in [0,1]
    x = 1400 * x + 100; // skewed distributed in [100,1500]

    --
    "TFritsch$t-online:de".replace(':','.').replace('$','@')
     
    Thomas Fritsch, Sep 19, 2005
    #3
  4. Roedy Green Guest

    On 19 Sep 2005 08:56:11 -0700, wrote or quoted :

    >Any clever ideas or hints would be much appreciated!


    you need a course in elementary probability and statistics.

    Here are some hints.

    See http://mindprod.com/jgloss/randomnumbers.htmls

    here is how nextGaussian works to produce a normal bell shaped curve
    distribution:

    synchronized public double nextGaussian() {
    if (haveNextNextGaussian) {
    haveNextNextGaussian = false;
    return nextNextGaussian;
    } else {
    double v1, v2, s;
    do {
    v1 = 2 * nextDouble() - 1; // between -1.0 and
    1.0
    v2 = 2 * nextDouble() - 1; // between -1.0 and
    1.0
    s = v1 * v1 + v2 * v2;
    } while (s >= 1 || s == 0);
    double multiplier = Math.sqrt(-2 * Math.log(s)/s);
    nextNextGaussian = v2 * multiplier;
    haveNextNextGaussian = true;
    return v1 * multiplier;
    }
    }

    It works by taking two random doubles.

    Another common distribution is called Poisson.

    You need to be more precise about just how the elements are skewed
    more toward the high end before you can come up with a formula to skew
    them.

    Here is the general idea of how you can do this.

    1. scale your random number 0..1 over a more interesting domain of a
    function with a simple multiplication.

    2. crank it through some non-linear formula, e.g. x squared, sqrt,
    exp, log, log base n, x^n, a polynomial, a chebychev polynomial,
    parabola,... doing this to exp(x) for example will result in points
    being dense at the low end and sparse at the high end.

    3. scale it back into suitable range with a multiplication.

    Different formulae will give you different skewings. If you don't
    have a particular mathematical model you need, just pick a formula
    that satisfies you intuitively. Graph the function and the
    distribution.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Again taking new Java programming contracts.
     
    Roedy Green, Sep 19, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. help
    Replies:
    2
    Views:
    406
  2. apoorv

    Image getting skewed

    apoorv, Feb 14, 2005, in forum: C++
    Replies:
    1
    Views:
    475
    Ivan Vecerina
    Feb 14, 2005
  3. globalrev
    Replies:
    4
    Views:
    812
    Gabriel Genellina
    Apr 20, 2008
  4. Paul Hemans
    Replies:
    2
    Views:
    2,162
    Paul Hemans
    May 9, 2010
  5. VK
    Replies:
    15
    Views:
    1,322
    Dr J R Stockton
    May 2, 2010
Loading...

Share This Page