Generating 2 independent random numbers

Jerry Coffin · May 29, 2008

[ ... ]

Ok, but it should be no problem to get the length of the period...
For example:
Store, say 256 rand() call results in an array.
Then infinetely call rand() and check after how many calls
it repeats the same sequence stored in the array....
and then quit the loop.
A simpler way would be of course looking in the TM of the compiler/library...
(but OTOH who does RTFM ?...

Consider the Mersenne Twister, which has a period of 2**19937-1.

You won't live long enough to see a repeat -- and neither will the sun.

In fact, playing with numbers like this quickly leads to cosmological
questions. For example, assume you could generate a random number with a
single quantum leap. Is there enough energy in the universe to generate
that many random numbers? I haven't figured it up, but my immediate
guess is no. For reference, consider that the number of atoms in the
universe is something like 2**200, so you need approximately 2**19737
quantum leaps per atom to do the job. Offhand, I don't recall how long a
quantum leap takes, but my immediate guess is that the second law of
thermodynamics wins first (i.e. the universe goes into heat death).

James Kanze · May 29, 2008

[ ... ]

Because it is the same sequence of the RNG...
A seed sequence of rand() always generates
RAND_MAX different random numbers.

Click to expand...

Click to expand...

False. It usually _will_, but there's no guarantee of it.

Click to expand...

As far as the standard goes, no. All the standard says is that
"The rand function computes a sequence of pseudo-random integers
in the range 0 to RAND_MAX." It doesn't say anything about the
quality of this sequence, nor even define what it means by
"pseudo-random integers". These issues are left to quality of
implementation considerations.

Presumably, something like:

#define RAND_MAX 60000
inline void srand( unsigned ) {}
inline int rand() { return 1 ; }

would be conforming, although it's really stretching the
definition of "pseudo-random integers", and it would certainly
not be considered to have acceptable quality, from a QoI point
of view. The C standard contains an example implementation,
which is however known to be "bad".

In general, if some particular "quality" of the random numbers
is important, you won't use std::rand(), but rather a known
generator with known characteristics. (The next version of the
C++ standard will contain a number of these.)

Even more thoroughly false. I've cut out a bunch more
statements, but they all reflect the same misconception.

Click to expand...

You're assuming that RAND_MAX reflects not only the range, but also the
period of the PRNG. While the standard's definition is loose enough to
_allow_ that, it's most assuredly not required. I'd add that, IMO, such
an implementation is _far_ from ideal.

Click to expand...

From a practical viewpoint, having the range and the period of
the generator equal appears to be fairly unusual.

Click to expand...

Let's say that if the range is something like 32767, it would be
a very, very poor generator which didn't have a longer period.
A 32 bit linear congruent generator like that described in
"Random Number Generators: Good Ones Are Hard to Find" (Park and
Miller, CACM, Oct. 1988) will have both RAND_MAX and the period
equal to 2147483647 (or something similar).

For example, consider the
following code:

Click to expand...

#include <stdlib.h>
#include <iostream>

Click to expand...

int main() {
srand(1);

int i;

for (i=0; i<9; i++)
std::cout << rand() << "\t";
std::cout << "\n";

for (; i<RAND_MAX; i++)
rand();

for (i=0; i<9; i++)
std::cout << rand() << "\t";
std::cout << "\n";
return 0;
}

Click to expand...

According to your statements, the two lines should be
identical (with a possible offset).

Click to expand...

If the period of the generator is RAND_MAX, the middle loop
should execute RAND_MAX-9, not RAND_MAX, for the two lines to
be the same.

Not that that actually changes the concrete results. As you
say, the period is almost always longer than RAND_MAX. (On a
lot of systems, RAND_MAX is only 32767, presumably for some
obscure historical reason.)

James Kanze · May 29, 2008

[ ... ]

While some linear congruent generators aren't very random in
the low order bits, the good ones are, and the high order
bits are never really very random: a very small number is
always followed by a very large one.

Click to expand...

Click to expand...

Hmmm...neither of the compilers I have handy seems to suffer from either
of these problems. For example, consider the following two sequences
generated by VC++ 7.1:

Click to expand...

41 18467 6334 26500 19169 15724 11478 29358 26962
29246 17262 12374 18983 5450 31630 32102 30438 30332

Click to expand...

I see no particular pattern of a particularly large number following a
small one. To help make possible problems in the low bits more apparent,
here's the same sequence printed in hex:

Click to expand...

29 4823 18be 6784 4ae1 3d6c 2cd6 72ae 6952
723e 436e 3056 4a27 154a 7b8e 7d66 76e6 767c

Click to expand...

At least right off, I don't see any obvious patterns on the
low bits either (though I'll openly admit I haven't tried to
do any sort of extensive analysis).

Click to expand...

Any linear congruent generator whose modulo is a power of 2 will
suffer from the problem in the low bits, with the period of the
nth bit being 2^n, which means that the low order bit will
alternate. Implementations using such generators, however, will
typically define RAND_MAX to be fairly small, and return the
high order bits. (The generator proposed in the C standard has
this problem, for example.) This problem isn't present if the
value for the constants (including m) is well chosen, however.

All linear congruent generators have a problem in the high order
bits, although it probably would require some analysis to
discover. Basically, given the formula
x[i+1] = ((a * x) + b) % m;
if x is less than m/a, then x[i+1] will be very large; the
"randomizing" effect of the modulo doesn't enter into play.
Globally, this will introduce a certain predictability into the
sequence; if you're using the common algorithm of:

(int)(N*(double)rand()/(double)(RAND_MAX+1))

to generate a random integer in the range 0...N, then there will
be a very perceptable bias after a 0; if you are using this with
N == 6, for example, to simulate dice, and you've just rolled a
1, the chance of rolling a second 1 is considerably less than
1/6.

Again, it's quite possible for an implementation to "mask" this
effect, at least partially, by returning only the low order
bits, or some bits in the middle if it also suffers from the
first problem. But this masking only means that the problem
won't be trivially visible, not that it isn't there.

Lionel B · May 29, 2008

[...]

This can be a bad idea, particularly for so-called "linear
congruential" rngs (and there's a fair chance your `rand()' will be
linear congruential). The reason for this is that mod-ing a number
basically involves using just the lower-order bits, which for many rngs
- and linear congruential ones in particular - are frequently the
"least random".

Click to expand...

Safer is something along the lines of:

Click to expand...

int r1 = int(100.0*double(rand())/double(RAND_MAX));

Click to expand...

that is, you generate a uniform(ish) random double in the range [0,1),
multiply it by 100 and take the integer part. This gives you a uniform
(ish) random int in the range [0,99). It's safer because it relies more
heavily on the higher-order bits of the number returned by your rng.

Click to expand...

Actually, just he reverse is true.

Just to be pedantic, my statement: "...it relies more heavily on the
higher-order bits of the number returned by your rng" is true. You were
questioning my (implicit) assumption that higher order bits are likely to
be more random for a linear congruential generator.

While some linear congruent
generators aren't very random in the low order bits, the good ones are,
Maybe.

and the high order bits are never really very random: a very small
number is always followed by a very large one.

I'd not noticed that; do you have a reference?

As an extreme example, I once played around with the "quick and dirty"
PRNG from Numerical Recipies - it's a one-liner LC generator that relies
on 32-bit unsigned int overflow; with one multiply and one add it's about
as fast as it gets. I then noticed that the lowest order bit goes 0, 1,
0, 1, 0, 1 ... (and IIRC the 2nd lowest order bit goes pretty much 0, 0,
1, 1, 0, 0, 1, 1 ...). Nonetheless, using the high order bits (as in my
example), it generated surprisingly acceptable uniform integer deviates
in a range [0,n) provided n is not to big.

Thomas J. Gritzan · May 29, 2008

James said:
If the period of the generator is RAND_MAX, the middle loop
should execute RAND_MAX-9, not RAND_MAX, for the two lines to
be the same.

It will. The middle loop doesn't reset i to 0.

Kai-Uwe Bux · May 29, 2008

James Kanze wrote:
[snip]

All linear congruent generators have a problem in the high order
bits, although it probably would require some analysis to
discover. Basically, given the formula
x[i+1] = ((a * x) + b) % m;
if x is less than m/a, then x[i+1] will be very large; the
"randomizing" effect of the modulo doesn't enter into play.
Globally, this will introduce a certain predictability into the
sequence; if you're using the common algorithm of:

(int)(N*(double)rand()/(double)(RAND_MAX+1))

to generate a random integer in the range 0...N, then there will
be a very perceptable bias after a 0; if you are using this with
N == 6, for example, to simulate dice, and you've just rolled a
1, the chance of rolling a second 1 is considerably less than
1/6.

I don't see that:

class lc_rng {

unsigned long long x;
unsigned long long a;
unsigned long long m;
unsigned long long b;

public:

lc_rng ( unsigned long seed,
unsigned long factor,
unsigned long modulus,
unsigned long shift = 0 )
: x ( seed )
, a ( factor )
, m ( modulus )
, b ( shift )
{}

unsigned long bot ( void ) const {
return ( 0 );
}

unsigned long top ( void ) const {
return ( m );
}

unsigned long operator() ( void ) {
x = ( x * a + b ) % m;
return ( x );
}

unsigned long operator() ( unsigned long N ) {
return( N * (double)( this->operator()() )
/
(double)( this->top() ) );
}

};

#include <vector>
#include <algorithm>
#include <iostream>
#include <iterator>

int main ( void ) {
// Park-Miller
lc_rng rng ( 123123, 16807, 214748347, 0 );

std::vector< unsigned long > count ( 6, 0 );
for ( unsigned int i = 0; i < 6000000; ++i ) {
while ( rng(6) != 0 ) {}
++count[ rng(6) ];
}
std::copy( count.begin(), count.end(),
std:stream_iterator< unsigned long >( std::cout, "\n" ) );
}

As you can see, this tabulates the elements of the sequence immediately
following a 0. Still, I do not see the predicted bias:

news_group> a.out
1000935
998832
1004322
996429
998931
1000551

Again, it's quite possible for an implementation to "mask" this
effect, at least partially, by returning only the low order
bits, or some bits in the middle if it also suffers from the
first problem. But this masking only means that the problem
won't be trivially visible, not that it isn't there.

Click to expand...

This borders metaphysics. The internal state of an RNG is not the measure of
quality. The output sequence is. If that passes all tests (empirical and
theoretical), I would condend the problem has not ben _masked_ but
_solved_.

Best

Kai-Uwe Bux

Jerry Coffin · May 29, 2008

(e-mail address removed)>, (e-mail address removed)
says...

[ ... ]

If the period of the generator is RAND_MAX, the middle loop
should execute RAND_MAX-9, not RAND_MAX, for the two lines to
be the same.

Look at it again. The initialization part of the second loop is empty.

robertwessel2 · May 29, 2008

[ ... ]

Because it is the same sequence of the RNG...
A seed sequence of rand() always generates
RAND_MAX different random numbers.

Click to expand...

False. It usually _will_, but there's no guarantee of it.

So within a sequence it cannot generate the same number twice.

Click to expand...

Even more thoroughly false. I've cut out a bunch more statements, but
they all reflect the same misconception.

You're assuming that RAND_MAX reflects not only the range, but also the
period of the PRNG. While the standard's definition is loose enough to
_allow_ that, it's most assuredly not required. I'd add that, IMO, such
an implementation is _far_ from ideal.

FWIW, the LCG PRNG suggested by the standard (see below), and used by
many implementations, has a period of 2**32.

Suggested by C standard:

#define RAND_MAX 32767
unsigned long next =1;

int rand(void)
{
next = next * 1103515245 +12345; /* note implicit mod 2**32 */
return (unsigned int)(next / 65536) % 32768;
}

It's not a great generator, but it's not horrible either.

From a practical viewpoint, having the range and the period of the
generator equal appears to be fairly unusual. For example, consider the
following code:

Indeed, that would normally be horrible, unless the range was very
large, and even then it fundamentally adds an increasing bias as you
get further into the sequence.

How to keep count of right answer and wrong answers in C++?	0	Nov 3, 2021
How to fix this code?	1	Sep 22, 2023
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
random function in loops	3	Jun 9, 2007
Having trouble adding random numbers	12	Apr 7, 2007
Crossword	2	May 11, 2020
Translater + module + tkinter	1	Feb 16, 2023
filling array with rand numbers	41	Aug 4, 2013

Generating 2 independent random numbers

Jerry Coffin

James Kanze

James Kanze

Lionel B

Thomas J. Gritzan

Kai-Uwe Bux

Jerry Coffin

robertwessel2

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads