How do you guys limit rand() results to a range?

Discussion in 'C Programming' started by DFS, Jun 1, 2014.

  1. (snip)
    Did anyone here ever figure out how many random numbers you need
    to look at before the difference is statistically significant?

    For one, if you have N random numbers between 0 and N-1, what
    is the probability that all N are generated? (Assume that the
    generator is actually random.)

    If you flip a coin N times, what is the probability of getting N/2
    heads. (Note that it is zero for all odd N.)

    -- glen
     
    glen herrmannsfeldt, Jun 2, 2014
    #21
    1. Advertisements

  2. There's nothing wrong with using a library function in the usual
    manner, by calling it rather than by recompiling it from source.

    If you need to use it on a platform where it's not already available,
    then yes, recompiling from source is a decent approach (assuming
    the source code itself is portable).
     
    Keith Thompson, Jun 2, 2014
    #22
    1. Advertisements

  3. I haven't, but for some values of N the difference can be *very*
    significant.

    One extreme case is RAND_MAX==32767 and N==32768. All but one of
    the 32769 values you want (assuming a decent rand() implementation)
    will appear equally often; the one remaining value, whichever one
    it is, will never appear. Or, with RAND_MAX==32767 and N==32766,
    one value will appear twice as often as the others. With slightly
    smaller N, several values will appear twice as often as the others.
    That's not hard to solve, but I'll leave it to someone else.
     
    Keith Thompson, Jun 2, 2014
    #23
  4. DFS

    James Kuyper Guest

    At best, with R = RAND_MAX/N, you'll have some numbers generated with a
    frequency that is (R+1)/R greater than others. If you know which numbers
    fall into each category, detecting that difference would require on the
    order of R^2 samples - which isn't a spectacularly large number by the
    standards of serious scientific simulations. A much larger sample size
    would be needed if determining which numbers fall into each category is
    part of what needs to be detected.
    For other, less demanding purposes, the uneven distribution is probably
    not much of a problem, at least for sufficiently large values of R.
     
    James Kuyper, Jun 2, 2014
    #24
  5. It is yes. As you saw, there was a portability bug in something as
    simple as uniform().
    Hmm.
    "Someone please write an random number generator, in C, with equivalent
    statistical properties to a unix random()". It depends what environment you're
    working in, that could be easy, or it could be very difficult.
     
    Malcolm McLean, Jun 2, 2014
    #25
  6. The requirement will be to find (not necessarily write) a random number
    generator that works where the original one did. This need not have the
    same statistical properties. The minimum requirements might be rather
    modest. You don't know if the OP is implementing an X.509 certification
    authority or a live desktop wallpaper with randomly twinkling stars.
     
    Ben Bacarisse, Jun 2, 2014
    #26
  7. We know that rand() isn't good enough, so either he's very fussy about
    the aesthetic properties of the twinkling stars, or it's the former.

    So it's a lot easier to have an RNG which is a bit-shuffing function, known
    to produce identical results wherever we run it, shipped with the
    program as portable ANSI C source, and not in the same source file
    as an IO procedure which accesses /dev/random.
    Then as long as the company has the skills to compile C source, they
    should have the skills to use the program provided.
     
    Malcolm McLean, Jun 2, 2014
    #27
  8. DFS

    Walter Banks Guest

    I did some work on random numbers used in a Monte Carlo
    simulator. The one thing to remember is, "A random number
    generator will try very hard to not be random".

    To test randomness is very difficult. Randomness has many
    properties. It starts with linear distribution and then distribution
    of numbers in pairs and so on.

    One good book on random number and random number
    statistical distributions is "Statistical Distributions" available
    in a number editions by Hastings and Peacock et al

    w..
     
    Walter Banks, Jun 2, 2014
    #28
  9. DFS

    Richard Bos Guest

    Absolutely don't.

    If you need high quality random numbers (and you probably don't),
    picking a random (PI) one from the net and cut'n'pasting it without
    understanding it thoroughly is a very bad idea. For starters, it may not
    even be portable to the platform you're porting to! But if you need
    HQRN, then you probably already know a better algorithm, and may even
    have been given one in the program specification.

    If you need medium quality random numbers (which you may), most systems
    now have random number functions which are quite adequate. These are
    much more likely to be well-debugged than one which you pulled at random
    from the net, and almost certain to give better performance on your
    implementation, with which, after all, it came packaged.

    If you only need low quality random numbers (which most of us do, most
    of the time), rand(), on a decent implementation, will be good enough.

    Richard
     
    Richard Bos, Jun 2, 2014
    #29
  10. DFS

    Richard Bos Guest

    No, we don't. To quote:

    # To my uninformed eye, both ways produce well-distributed random
    # results within a 10 million size test loop.

    That sounds to me like he thinks rand() _is_ good enough for him, and
    he's just wondering what the fuss is all about.

    He's probably right, too.

    Richard
     
    Richard Bos, Jun 2, 2014
    #30
  11. Who's "we", and how do "we" know this? I don't recall the OP ever saying
    that rand() is not good enough.

    <snip>
     
    Ben Bacarisse, Jun 2, 2014
    #31
  12. DFS

    DFS Guest

    It's definitely good enough for the task that launched my exploration of
    random number generation in C (randomly pick a name from a list of 10).

    I enjoyed the discussion, and did some further reading. Looks like the
    Mersenne Twister is far better than rand().

    Here's an interesting site:
    http://www.random.org/
     
    DFS, Jun 2, 2014
    #32
  13. rand() may be a Mersenne Twister implementation (though the API will
    hamper the seeding).
     
    Ben Bacarisse, Jun 2, 2014
    #33
  14. DFS

    James Kuyper Guest

    That depends entirely upon what the randomly selected name is to be used
    for. You can't determine whether a given approach produces sufficiently
    equal probabilities, without first knowing precisely why it it (or is
    not) important that the probabilities of the different choices be equal.
     
    James Kuyper, Jun 2, 2014
    #34
  15. [...]

    Can you expand on that? I'm having trouble thinking of an
    interpretation of that last statement that makes sense.
    What am I missing?
     
    Keith Thompson, Jun 2, 2014
    #35
  16. DFS

    Martin Shobe Guest

    I don't think there's a simple answer, it's going to depend rather
    heavily on the test being performed.

    For example. If we know which ones are bad, we can design our test as a
    proportion. Here we will use N = 10000 and RAND_MAX = 32767. There
    should be 2768 numbers that are more likely than the remaining 7232. If
    the numbers were correctly distributed we would expect this proportion
    to be .2768 while the actual proportion is .337890625. You only need 344
    samples to detect this with a confidence of 95% and a power of 80%. (For
    those who don't know, the confidence here is the chance of a sample from
    the correctly distributed numbers to pass the test while the power is
    the chance of the incorrectly distributed numbers to fail the test. A
    sample passes the test if we consider it to be correctly distributed).

    On the other hand, if the incorrectly distributed numbers are evenly
    split between the two groups, the actual proportion is now
    ..295654296875. You would need a sample size of 3492 to detect that
    difference with a confidence of 95% and a power of 80%.
    if N is odd, 0.
    if N is even, N!/(2^N(N/2)!(N/2)!).

    Martin Shobe
     
    Martin Shobe, Jun 2, 2014
    #36
  17. DFS

    Richard Bos Guest

    I've looked into this, but I don't think it's possible, at least not
    perfectly in the general case. You just get the same problem, recursive.

    To take the simplest possible example, suppose you take one single bit
    from each rejected number[1], and string those into a new random number.
    You still have the same problem that _this_ number may be greater than
    29999. You can then reject it and start constructing a new one, but
    there is no way, with a truly uniform RNG, to guarantee that you will
    construct any acceptable number in any length of time.
    You simply can't construct a completely random number out of bits that
    is less than X, unless X is a power of two. And bits are all we have to
    work with.
    Certainly. Even the above method gives you the option of having a tiny
    chance of never finishing (only with a perfect RNG), or an epsilon-but-
    not-zero difference from uniformity.

    Of course, most RNGs (and all PRNGs, which is what we're really talking
    about) _are not_ perfectly uniform, _will_ cycle through all possible
    outputs sooner or later (ideally several times in different orders), and
    therefore _will_ produce an acceptable output well before the cows come
    home. So in practice that's all right unless security reasons demand
    that the timing between random numbers is perfectly uniform - in which
    case, you shouldn't be taking hints from this thread, but giving them
    (and probably using completely different methods).

    Richard

    [1] Not the most significant one, you clot!
     
    Richard Bos, Jun 2, 2014
    #37
  18. I think you're right.

    If RAND_MAX==32767 then each call gives you exactly 15 bits of
    randomness. In principle, you should be able to treat a sequence
    of results as a continuous stream of randomness from which you can
    extract random numbers in any range you choose, using exactly as
    many bits as you need (not necessarily a whole number) for each one.
    In practice, though, since we can only store and manipulate discrete
    values (even in floating-point), I think any method must still be
    vulnerable to an unlikely sequence that would cause it to break.

    I suppose you could use arbitrary-precision rational numbers to store
    the pending randomness, but then you lose performance and risk running
    out of memory. (I don't seriously suggest that as a solution.)

    [...]
     
    Keith Thompson, Jun 2, 2014
    #38
  19. Not completely understanding it either, it reminds me of what Knuth
    says about RNGs in TAOCP vol. 2, (and, being some time since I last
    read it) that you don't want to randomly select the algorithm for
    a RNG.

    In other words, you don't want randomness in the algorithm itself.

    It might be human (hopefully not CS) intuition that more
    randomness in the algorithm would generate more random results.

    Seems to me that Knuth and Walter disagree.

    -- glen
     
    glen herrmannsfeldt, Jun 2, 2014
    #39
  20. DFS

    Ike Naar Guest

    What's the abbreviation CS? (common sense? computing science?)
     
    Ike Naar, Jun 2, 2014
    #40
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.