Generating 2 independent random numbers

Discussion in 'C++' started by bilgekhan, May 28, 2008.

  1. bilgekhan

    bilgekhan Guest

    What is the correct method for generating 2 independent random numbers?
    They will be compared whether they are equal.

    What about this method:

    int r1 = rand();
    int r2 = rand();
    bool f = r1 == r2;
    bilgekhan, May 28, 2008
    1. Advertisements

  2. bilgekhan

    bilgekhan Guest

    In my prev. posting I forgot to mention that the
    random numbers shall be in the range 0 to 99.
    So here my updated question:

    What is the correct method for generating
    2 independent random numbers in the range 0 to 99 ?
    They will immediately be compared for equality.

    What about this method:

    int r1 = rand() % 100;
    int r2 = rand() % 100;
    bool f = r1 == r2;
    bilgekhan, May 28, 2008
    1. Advertisements

  3. What Sam said.
    The rand() function returns a pseudo-random integer between 0 and RAND_MAX.

    Therefore, assuming it's excluding RAND_MAX, and including 0, and
    assuming each value is equiprobable (which is not specified by the man
    page of rand(3) on Linux), if RAND_MAX % 100 is not 0, then your
    randoms r1 and r2 won't be equiprobable.

    But this is already assuming too much, as further reading of the
    rand(3) man page.

    If you care so little about your randomness, you could as well write

    bool f=((rand()%100)==(rand()%100));
    Pascal J. Bourguignon, May 28, 2008
  4. bilgekhan

    Kai-Uwe Bux Guest

    Not good: the reseeding of the RNG just has unknown effects. If the RNG used
    by rand() is good, the reseeding very likely will make it worse. At the
    very least, you loose all the good things that the theory inspiring the RNG

    Your question has two aspects:

    a) How to generate independently distributed random numbers?

    b) How to generate random numbers in [0,100)?

    As for (a), any RNG is supposed to yield pseudo random numbers that look
    independent. Therefore, just using a decent RNG will take care of the
    independence requirement.

    As for (b), the %-trick is probably good enough for your purposes. However,
    beware that unless RAND_MAX+1 is divisible by 100.

    As for the immediate comparison part: if you discard r1 and r2 immediately
    after the comparison, and only keep f, then it suffices to generate one
    number r in [0,100) and do:

    bool f = ( r == 0 );

    because that will yield true with probability 0.01, too.


    Kai-Uwe Bux
    Kai-Uwe Bux, May 28, 2008
  5. bilgekhan

    Lionel B Guest

    As others have mentioned, most random number generators (and almost
    certainly your `rand()' call) generate *pseudo*-random sequences of
    numbers. "Real" random number sequences (and it's by no means trivial to
    say what that even means) are hard to come by and will generally involve
    some hardware-based solution. I'll assume that pseudo random is good
    enough for your purposes.

    But be aware that there is a huge variation in the "randomness" of
    sequences generated by pseudo random generators. The system-supplied
    generators (like `rand()') are often astonishingly non-random, and the
    (current) C++ standard allows them to be as cruddy as they please.
    Thankfully, the forthcoming C++ standard will include some higher quality
    generators such as the Mersenne Twister (recommended, Google it). But
    even that would not be good enough e.g. for cryptographic purposes.
    This is ok as long as you don't call this routine again so quickly that
    time(0) returns the same value as for the last call.
    This can be a bad idea, particularly for so-called "linear congruential"
    rngs (and there's a fair chance your `rand()' will be linear
    congruential). The reason for this is that mod-ing a number basically
    involves using just the lower-order bits, which for many rngs - and
    linear congruential ones in particular - are frequently the "least

    Safer is something along the lines of:

    int r1 = int(100.0*double(rand())/double(RAND_MAX));

    that is, you generate a uniform(ish) random double in the range [0,1),
    multiply it by 100 and take the integer part. This gives you a uniform
    (ish) random int in the range [0,99). It's safer because it relies more
    heavily on the higher-order bits of the number returned by your rng.

    Probably a bad idea to re-seed your rng with a number generated by the
    same rng... and pointless in any case. Seed once and generate away...
    Leave out that line.
    Again, as for r1 above.
    Lionel B, May 28, 2008
  6. bilgekhan

    bilgekhan Guest

    Have you tried this in a loop of say 1 million times and counted the hits?
    I think theoretically there should be about 10,000 matches.
    What result do you get?
    Maybe 0 ? :)
    bilgekhan, May 28, 2008
  7. bilgekhan

    Lionel B Guest

    Why do you think you might get 0 ?
    Lionel B, May 28, 2008
  8. bilgekhan

    bilgekhan Guest

    I did some experiments with the standard pseudo-RNG
    of the compiler and I come to the conclusion that it is
    impossible to let 1 RNG _correctly_ generate 2 independent
    random numbers.
    If someone manages this let me know pls, though
    I solved my original problem by the clever method of Kai-Uwe below.
    Unfortuately it's not divisable by 100, and
    unfortunately on my machine RAND_MAX is only 32767 :-(
    That's really a nice solution!
    So there is no need for 2 random numbers at all,
    and all the other associated problems with it.
    Thanks, Kai-Uwe!
    bilgekhan, May 28, 2008
  9. bilgekhan

    bilgekhan Guest

    Because it is the same sequence of the RNG... :)
    A seed sequence of rand() always generates
    RAND_MAX different random numbers.
    So within a sequence it cannot generate the same number twice.
    After generating RAND_MAX numbers the sequence
    starts over again to generate again the same sequence of numbers.

    Since the program is within the same sequence
    and since the two numbers are generated together
    it never can happen that these 2 numbers somehow match...

    Ie. rand() is predictable: if you generate RAND_MAX numbers
    and store them somewhere then you can look there what random
    number it will generate next. :)
    But this works only if you don't change the rand sequence
    by calling srand() a second time.

    I did some experiments and I come to the conclusion
    that it is not NOT possible to let 1 RNG _correctly_ generate
    2 independent random numbers.
    One has to solve it cleverly via just 1 random number only
    as was shown by Kai-Uwe Bux (cf. his posting).
    bilgekhan, May 28, 2008
  10. bilgekhan

    Lionel B Guest

    Really? The documentation for my system doesn't say that. It simply says:
    "The rand() function returns a pseudo-random integer between 0 and
    RAND_MAX". Maybe your rand() does what you say... there is no requirement
    for it to do so.

    In any case, that's irrelevant, since x !=y does not imply x%100 != y%
    100. Think about it - and try it:

    #include <iostream>
    #include <cstdlib>

    int main()
    const int N = 1000000;

    int hits = 0;
    for (int n=0;n<N;++n)
    if ((std::rand()%100)==(std::rand()%100)) ++hits;

    std::cout << "hits = " << hits << '\n';

    On my system this outputs:

    hits = 10195
    Lionel B, May 28, 2008
  11. No. It generates numbers between 0 and RAND_MAX. RAND_MAX has no
    relationship to how long the period of the rand() RNG is.

    Furthermore, even if you were right and it could only generate each
    value once, mapping the RNG output to values [0, 100) means that many
    (RAND_MAX/100) values from the RNG map to one value in the output, so
    the expression rand() % 100 could very well generate '5' twice in a
    False, or at best buggy. In fact, this is a dangerous property for an
    RNG, and no respectable PRNG has it. If your system's rand() RNG
    can't generate a value twice in the same period, report it to the
    appropriate vendor's bug list and get it fixed.
    This is true: iff you know the period of your rng (the number of
    samples it produces before repeating its list). However, the period
    of rand() is allowed to be much larger than RAND_MAX -- in which case
    it must, by simple counting logic, produce at least one value more
    than once.
    Indeed. It's better to understand the probability distribution you
    want -- often you can achieve it with only a single RNG, even for more
    complicated distributions. Multiple RNG samples aren't any "more
    random" than a single sample.
    Owen Jacobson, May 28, 2008
  12. bilgekhan

    Kai-Uwe Bux Guest

    If that is true, then the implementation of the RNG you are using sucks. A
    good pseudo random number generator will pass tests for statistical
    independence of subsequent returns.

    As far as I can see, the OP just uses the ordinary concept of independent
    random events/variables: knowing one does not help predicting the other.
    Why do you ask? do you have any reason to suspect otherwise?


    Kai-Uwe Bux
    Kai-Uwe Bux, May 28, 2008
  13. bilgekhan

    Lionel B Guest

    I assume the OP intends "independent" in the (perfectly well-defined)
    probability-theoretic sense.

    I guess you could argue that it doesn't make much sense to talk about
    independence of *pseudo* random numbers. In practice, however, you can
    perfectly well look for statistics such as serial (sample) correlation
    between successive random deviates generated by some PRNG - this is one
    intuitive notion of "how independent" successive deviates are. This would
    certainly be one (of many) tests routinely applied to gauge the quality
    of a PRNG.

    The whole question of "how random is a sequence of numbers" leads into
    astonishingly deep and murky mathematical/philosophical waters (of which
    I have little knowledge).
    Lionel B, May 28, 2008
  14. Yes, that's what I'd expect. And that's why the OP could expect too,
    "If [he] care so little about [his] randomness" so as not to read
    the rand(3) man page.
    Pascal J. Bourguignon, May 28, 2008
  15. bilgekhan

    bilgekhan Guest

    Hmm... this method has a drawback of slowing the program.
    If RAND_MAX is say 32767 then the window in the routine above
    is just 0.3% of the full range, so in 99.7% of the calls it will waste
    extra calls to rand()...
    bilgekhan, May 28, 2008
  16. bilgekhan

    Lionel B Guest

    No, if RAND_MAX is 32767 then (RAND_MAX / 100) * 100 = 32700, so you
    waste only about 0.2% of calls.
    Lionel B, May 28, 2008
  17. bilgekhan

    bilgekhan Guest

    Ok, but it should be no problem to get the length of the period... :)
    For example:
    Store, say 256 rand() call results in an array.
    Then infinetely call rand() and check after how many calls
    it repeats the same sequence stored in the array.... :)
    and then quit the loop.
    A simpler way would be of course looking in the TM of the compiler/library...
    (but OTOH who does RTFM ?... :)
    Independent as so far as 2 consecutive calls to rand()
    also generate twice the same random number as
    theoretically would be expected in 10% of 1 million cases
    when each time generating 2 numbers between 0 and 99.
    That's 1e6 / 100 = 10000 = 10% of 1e6.

    For achieving this independence one needs to switch
    to another seed sequence. But my experiments show
    that then the distribution is flawed as it deviates from
    the theoretical expectation. I managed to get it within
    +1.16% more matches by using the following ugly kludge:
    srand((1U + rand()) * (1U + rand()) * 4U /* - 1 */);
    as RAND_MAX on my machine is only 32767.
    But then I discarded the idea of wanting to have just one RNG
    generate such 2 independent random numbers.

    Using just one RNG, and within the same period (srand sequence),
    it is not possible to get the theoretically expected _correct_ number
    of matches. Ie. one would need two identical but independent RNGs
    and then both initialized with different seed values.
    bilgekhan, May 28, 2008
  18. bilgekhan

    James Kanze Guest

    It's somewhat implicit in what you just said, but you
    *shouldn't* use this if you need independent random sequences on
    different machines---the programs could be started within the
    same second (and in a lot of applications, typically will be).

    I generally use /dev/random if it's available, and a mixture of
    time(), pid() and the machine's MAC address (or failing that,
    its IP address) otherwise.
    Actually, just he reverse is true. While some linear congruent
    generators aren't very random in the low order bits, the good
    ones are, and the high order bits are never really very random:
    a very small number is always followed by a very large one.
    James Kanze, May 28, 2008
  19. bilgekhan

    Jerry Coffin Guest

    [ ... ]
    False. It usually _will_, but there's no guarantee of it.
    Even more thoroughly false. I've cut out a bunch more statements, but
    they all reflect the same misconception.

    You're assuming that RAND_MAX reflects not only the range, but also the
    period of the PRNG. While the standard's definition is loose enough to
    _allow_ that, it's most assuredly not required. I'd add that, IMO, such
    an implementation is _far_ from ideal.

    From a practical viewpoint, having the range and the period of the
    generator equal appears to be fairly unusual. For example, consider the
    following code:

    #include <stdlib.h>
    #include <iostream>

    int main() {

    int i;

    for (i=0; i<9; i++)
    std::cout << rand() << "\t";
    std::cout << "\n";

    for (; i<RAND_MAX; i++)

    for (i=0; i<9; i++)
    std::cout << rand() << "\t";
    std::cout << "\n";
    return 0;

    According to your statements, the two lines should be identical (with a
    possible offset). With the compilers I have handy, that's not the case.
    Here's the output from VC++ 7.1:

    41 18467 6334 26500 19169 15724 11478 29358 26962
    29246 17262 12374 18983 5450 31630 32102 30438 30332

    With g++ (4.3.1), these are the results:

    1481765933 1085377743 1270216262 1191391529
    812669700553475508 445349752 1344887256 730417256

    1039805031 1083326686 1331336834 716584787
    2033177353934031309 1079589791 1155192414 1372256943

    In neither case does repetition appear after producing RAND_MAX numbers.
    For most things, I also test with Comeau, but it normally uses the back-
    end compiler's implementation of rand(), so it would produce results
    identical to the back-end compiler's.
    Jerry Coffin, May 29, 2008
  20. bilgekhan

    Jerry Coffin Guest

    [ ... ]
    Hmmm...neither of the compilers I have handy seems to suffer from either
    of these problems. For example, consider the following two sequences
    generated by VC++ 7.1:

    41 18467 6334 26500 19169 15724 11478 29358 26962
    29246 17262 12374 18983 5450 31630 32102 30438 30332

    I see no particular pattern of a particularly large number following a
    small one. To help make possible problems in the low bits more apparent,
    here's the same sequence printed in hex:

    29 4823 18be 6784 4ae1 3d6c 2cd6 72ae 6952
    723e 436e 3056 4a27 154a 7b8e 7d66 76e6 767c

    At least right off, I don't see any obvious patterns on the low bits
    either (though I'll openly admit I haven't tried to do any sort of
    extensive analysis).
    Jerry Coffin, May 29, 2008
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.