Random Numbers

I

Ioannis Vranos

I am reading TC++PL3 and on page 685 it is written:

"Producing a good random-number generator isn't easy, and unfortunately
not all systems deliver a good rand(). In particular, the low-order bits
of a random number are often suspect, so rand() %n is not a good
portable way of generating a random number between 0 and n-1. Often,
int((double(rand())/RAND_MAX) *n) gives acceptable results. However, to
seriously use that formula, we must take care of the minuscule
probability that the result will be n".


Q1: What are the "low-order bits"?

Q2: "we must take care of the minuscule probability that the result will
be n". This means the result of rand() call, or the result of the
formula? And why?
 
M

Michael DOUBEZ

Ioannis Vranos a écrit :
I am reading TC++PL3 and on page 685 it is written:

"Producing a good random-number generator isn't easy, and unfortunately
not all systems deliver a good rand(). In particular, the low-order bits
of a random number are often suspect, so rand() %n is not a good
portable way of generating a random number between 0 and n-1. Often,
int((double(rand())/RAND_MAX) *n) gives acceptable results. However, to
seriously use that formula, we must take care of the minuscule
probability that the result will be n".


Q1: What are the "low-order bits"?

They are the least significant bits of the integer.
Q2: "we must take care of the minuscule probability that the result will
be n". This means the result of rand() call, or the result of the
formula? And why?

rand() can yield RAND_MAX so rand()/RAND_MAX may be one and
int((double(rand())/RAND_MAX) *n) may be n.

What is wanted is a number in the range [0;n) therefore n should be
excluded.

Michael
 
J

jkherciueh

Ioannis said:
I am reading TC++PL3 and on page 685 it is written:

"Producing a good random-number generator isn't easy, and unfortunately
not all systems deliver a good rand(). In particular, the low-order bits
of a random number are often suspect, so rand() %n is not a good
portable way of generating a random number between 0 and n-1. Often,
int((double(rand())/RAND_MAX) *n) gives acceptable results. However, to
seriously use that formula, we must take care of the minuscule
probability that the result will be n".


Q1: What are the "low-order bits"?

The low order bits are the ones that determine the congruence classes mod
2^n for small n. The lowest bit determines whether a number es even or odd.

Linear congruence RNGs can suffer from a small period in low-order bits.
Some are so weak that they produce even and odd results alternatingly. (I
have, however, never seen a rand() that does that, but I then again, I
didn't search for one either:)

Q2: "we must take care of the minuscule probability that the result will
be n". This means the result of rand() call, or the result of the
formula? And why?

The formula. The reason is that rand() might return RAND_MAX.


BTW: no simple rescaling formula is going to give you a mathematically
correct uniform sample in [0,n) for an n that does not divide evenly into
RAND_MAX+1. In order to do it right (and not just approximately right), you
will have to discard some values and throw the dice again. Also, it is a
little tricky to upscale things for the case n > RAND_MAX+1. Whether the
difference between "right" and "approximately right" matters depends on
your application.

The next version of the standard will have uniform_int_distribution, which
is supposed to take care of this kind of problem. Currently, there is
Boost::random, but its uniform_int has a bug (wasn't fixed last time I
checked) that makes it non-uniform in some cases.


Best

Kai-Uwe Bux
 
L

Lionel B

Linear congruence RNGs can suffer from a small period in low-order bits.
Some are so weak that they produce even and odd results alternatingly.
(I have, however, never seen a rand() that does that, but I then again,
I didn't search for one either:)

FYI, the "quick and dirty" PRNG "ranqd1" in "Numerical Recipes in C" does
this. It's a one-liner linear congruential generator which uses one
multiply and one add (32-bit unsigned int) and is thus about as fast as
it gets.

Despite the (very) weak low-order randomness, I'd hazard that it not much
worse than many system-supplied PRNGs - so long as you know what you're
dealing with and, of course you don't mod it!
 
J

James Kanze

The low order bits are the ones that determine the congruence
classes mod 2^n for small n. The lowest bit determines whether
a number es even or odd.
Linear congruence RNGs can suffer from a small period in
low-order bits. Some are so weak that they produce even and
odd results alternatingly. (I have, however, never seen a
rand() that does that, but I then again, I didn't search for
one either:)

They can, but they don't have to. (And I've seen many that do. )

IMHO, I don't think the advice above is really that good. If
the random number generator is bad, it's bad. Changing the way
you exploit the value may hide the badness superficially, but it
doesn't make the results any more random.
The formula. The reason is that rand() might return RAND_MAX.
BTW: no simple rescaling formula is going to give you a mathematically
correct uniform sample in [0,n) for an n that does not divide evenly into
RAND_MAX+1. In order to do it right (and not just approximately right), you
will have to discard some values and throw the dice again. Also, it is a
little tricky to upscale things for the case n > RAND_MAX+1. Whether the
difference between "right" and "approximately right" matters depends on
your application.
The next version of the standard will have uniform_int_distribution, which
is supposed to take care of this kind of problem. Currently, there is
Boost::random, but its uniform_int has a bug (wasn't fixed last time I
checked) that makes it non-uniform in some cases.

Handling the problem you mention isn't that hard, and has been
discussed here several times. What's good in the random
generators from Boost (and TR1?) is that they have guaranteed
behavior. The characteristics are part of the specification.
 
J

jkherciueh

James said:
They can, but they don't have to. (And I've seen many that do. )

IMHO, I don't think the advice above is really that good. If
the random number generator is bad, it's bad. Changing the way
you exploit the value may hide the badness superficially, but it
doesn't make the results any more random.

That is not entirely correct. A linear congruence RNG with modulus 2^n can
be a perfectly sound implementation of a fair coin if you use the highest
bit as the result of the toss. If you use the lowest bit instead, you are
very likely to get a rather poor random bit. And this is not superficially
hiding problems: the first implementation can (and likely will) pass
statistical tests whereas the second can (and likely will) fail.

In some sense the lower order bits of linear congruence RNGS could be
considered part of their internal state that should not really be exposed
to the client.

The formula. The reason is that rand() might return RAND_MAX.
BTW: no simple rescaling formula is going to give you a mathematically
correct uniform sample in [0,n) for an n that does not divide evenly into
RAND_MAX+1. In order to do it right (and not just approximately right),
you will have to discard some values and throw the dice again. Also, it
is a little tricky to upscale things for the case n > RAND_MAX+1. Whether
the difference between "right" and "approximately right" matters depends
on your application.
The next version of the standard will have uniform_int_distribution,
which is supposed to take care of this kind of problem. Currently, there
is Boost::random, but its uniform_int has a bug (wasn't fixed last time I
checked) that makes it non-uniform in some cases.

Handling the problem you mention isn't that hard, and has been
discussed here several times.

The only problem of which I claimed it is non-trivial is to find a random
number in [0,n) using calls to an underlying RNG that gives values in [0,m)
for some fixed m < n. In that case you need to collect several samples and
assemble the result from there (and still you might need to discard the
outcome and start over).

The problem that has been discussed on this list several times is for m > n.

What's good in the random
generators from Boost (and TR1?) is that they have guaranteed
behavior. The characteristics are part of the specification.

More or less. This goes for the engines (the building blocks). As far as
adaptors are concerned, their specifications are a little more vague (as
one would expect). E.g., it is not specified whether uniform_int used will
be more sensitive to low-order bits or to high-order bits of the underlying
engine. It is also not specified how many calls (on average) of the
underlying engine will be needed.

Also, what happens if you use unform_int with an underlying URNG that has a
floating point type as a result_type is somewhat beyond comprehension. It
is a good thing that C++0X moved away from the specifications in TR1 and
reworked <random> entirely.

Of course, the flip-side of your observation that Boost and TR1 provide
specified characteristics is that you have no guarantees at all for rand().
If you are coding to a given platform, the implementation should document
the characteristics of rand(), but if you are writing code that is meant to
be portable, things get tricky: the library vendor has to implement rand()
based on the best guesses about the needs of the customers and different
library vendors will cater to different needs. Thus, you have no idea which
RNG you will get when using rand(). Of course, whether this matters depends
on the application.


Best

Kai-Uwe Bux
 
J

James Kanze

James Kanze wrote:
That is not entirely correct. A linear congruence RNG with modulus 2^n can
be a perfectly sound implementation of a fair coin if you use the highest
bit as the result of the toss. If you use the lowest bit instead, you are
very likely to get a rather poor random bit. And this is not superficially
hiding problems: the first implementation can (and likely will) pass
statistical tests whereas the second can (and likely will) fail.
In some sense the lower order bits of linear congruence RNGS could be
considered part of their internal state that should not really be exposed
to the client.

I based my statement on the article "Random Number Generators:
Good Ones Are Hard to Find", by Park and Miller (CACM, Oct.
1988). Could you point me to some further analysis to support
what you are saying?
Q2: "we must take care of the minuscule probability that the
result will be n". This means the result of rand() call, or
the result of the formula? And why?
The formula. The reason is that rand() might return RAND_MAX.
BTW: no simple rescaling formula is going to give you a
mathematically correct uniform sample in [0,n) for an n
that does not divide evenly into RAND_MAX+1. In order to do
it right (and not just approximately right), you will have
to discard some values and throw the dice again. Also, it
is a little tricky to upscale things for the case n >
RAND_MAX+1. Whether the difference between "right" and
"approximately right" matters depends on your application.
The next version of the standard will have
uniform_int_distribution, which is supposed to take care of
this kind of problem. Currently, there is Boost::random,
but its uniform_int has a bug (wasn't fixed last time I
checked) that makes it non-uniform in some cases.
Handling the problem you mention isn't that hard, and has
been discussed here several times.
The only problem of which I claimed it is non-trivial is to
find a random number in [0,n) using calls to an underlying RNG
that gives values in [0,m) for some fixed m < n. In that case
you need to collect several samples and assemble the result
from there (and still you might need to discard the outcome
and start over).
The problem that has been discussed on this list several times
is for m > n.

OK. I misunderstood that.
More or less. This goes for the engines (the building blocks).
As far as adaptors are concerned, their specifications are a
little more vague (as one would expect). E.g., it is not
specified whether uniform_int used will be more sensitive to
low-order bits or to high-order bits of the underlying engine.
It is also not specified how many calls (on average) of the
underlying engine will be needed.

Yes, but if you use a good engine, it doesn't matter.
Also, what happens if you use unform_int with an underlying
URNG that has a floating point type as a result_type is
somewhat beyond comprehension. It is a good thing that C++0X
moved away from the specifications in TR1 and reworked
<random> entirely.
Of course, the flip-side of your observation that Boost and
TR1 provide specified characteristics is that you have no
guarantees at all for rand().

Exactly. As far as the standard goes,
int rand() { return 0 ; }
is perfectly compliant (although obviously not conform with the
intent). Practically, the C standard gives an example
implementation which corresponds to a very poor generated (at
least according to Park and Miller), and a lot of
implementations of the C library use (or used) this example.
If you are coding to a given platform, the implementation
should document the characteristics of rand(), but if you are
writing code that is meant to be portable, things get tricky:
the library vendor has to implement rand() based on the best
guesses about the needs of the customers and different library
vendors will cater to different needs. Thus, you have no idea
which RNG you will get when using rand(). Of course, whether
this matters depends on the application.

It's probably best to suppose that rand() isn't very good, and
try for something better, if it matters.
 
J

jkherciueh

James said:
I based my statement on the article "Random Number Generators:
Good Ones Are Hard to Find", by Park and Miller (CACM, Oct.
1988).

That paper mostly concerns itself with choosing the right modulus and
multiplier. It does, however, in passing, also address the question how to
turn the result into a random bit: on page 1200 left-column it points out
that the k-th bit it Grogono's RNG cycles with period 2^k. And they quote
Knuth to point out that this is typical for mixed linear congruence RNGs
with modulus 2^n. We shall see below what the reason is.

Could you point me to some further analysis to support what you are
saying?

What I was thinking of is mostly the Table 1 in Section 3.3.4 of Knuth TAOCP
(I have the third edition, published several years after Park and Miller).
It lists the results of the spectral test for various linear congruence
RNGs and shows that there are some pretty decent ones even with modulus
2^n. Knuth also points out a reason as to why the spectral test is
relevant: "all generators known to be bad actually fail it" [beginning of
3.3.4].

Note that Park and Miller also acknowledge that with a carefully chosen
multiplier linear congruence RNGs with modulus 2^n can have period 2^n/4
and mixed linear congruence RNGs can have full period 2^n: in fact they do
if the multiplier is congruent 1 mod 4 and the shift constant is odd
[Knuth, 3.2.1.2 Theorem A].

Now for the claim that any (mixed) linear congruence RNG with modulus 2^n
will have weak low order bits: consider a mixed linear congruence RNG

x = ( a * x + c ) % 2^n

that has full period (i.e., a is cong 1 mod 4 and c is odd). Then the lowest
k bits obey the recursion

y = ( a * y + c ) % 2^k.

which also has full period 2^k (since a and c did not change!). In
particular, the lowest-order bit always alternates.

On the other hand, the highest-order bit alone has full period 2^n. For
contradiction, assume it had period 2^m with m < n (note that the period
must be a divisor of 2^n and hence a power of 2). Then, since the first n-1
bits have period 2^(n-1), the overall period would be LCM(2^m, 2^(n-1))
which is 2^(n-1) contradicting the assumption that the RNG has full period
2^n.

As for the quality of high-order bits, one also should consider what results
of the spectral tests (and many others) mean. The spectral test by and in
itself applies to a sequence of real numbers in [0,1) that are presented as
a sequence of independent uniformly distributed values. The results from
the RNG are translated into real numbers by dividing each by the modulus
(2^n). Note that the highest-order bit tells you which half of the
unit-interval contains the number. Thus, if tests say that the sequence is
good, they also say (at least morally) that the high-order bits are good
whereas the lowest order bits are insignificant since they only amount to
small perturbations of the real number.

My remark that the low-order bits should be considered part of the internal
state derives largely from this. E.g., in Table 1 [mentioned above], line
26 there is a good multiplier for the modulus 2^64. I would not hesitate to
use that with an unsigned long long as the internal state. However, I would
have the generator return only and unsigned long exposing just the highest
32 bits of the state to the client. Then, even the lowest bit of the
observable sequence will have period 2^33 and the resulting RNG can be
expected to pass all tests with flying colors (at least it will pass the
spectral test).


Best

Kai-Uwe Bux
 
I

Ioannis Vranos

Michael said:
Ioannis Vranos a écrit :
I am reading TC++PL3 and on page 685 it is written:

"Producing a good random-number generator isn't easy, and
unfortunately not all systems deliver a good rand(). In particular,
the low-order bits of a random number are often suspect, so rand() %n
is not a good portable way of generating a random number between 0 and
n-1. Often,
int((double(rand())/RAND_MAX) *n) gives acceptable results. However,
to seriously use that formula, we must take care of the minuscule
probability that the result will be n".

Q2: "we must take care of the minuscule probability that the result
will be n". This means the result of rand() call, or the result of the
formula? And why?

rand() can yield RAND_MAX so rand()/RAND_MAX may be one and
int((double(rand())/RAND_MAX) *n) may be n.

What is wanted is a number in the range [0;n) therefore n should be
excluded.

I suppose the range [0, n) is [0, n-1] here. So what if we considered
the formula to be for range [0, n]? Is there some flaw in this?
 
J

jkherciueh

Ioannis said:
Michael said:
Ioannis Vranos a écrit :
I am reading TC++PL3 and on page 685 it is written:

"Producing a good random-number generator isn't easy, and
unfortunately not all systems deliver a good rand(). In particular,
the low-order bits of a random number are often suspect, so rand() %n
is not a good portable way of generating a random number between 0 and
n-1. Often,
int((double(rand())/RAND_MAX) *n) gives acceptable results. However,
to seriously use that formula, we must take care of the minuscule
probability that the result will be n".

Q2: "we must take care of the minuscule probability that the result
will be n". This means the result of rand() call, or the result of the
formula? And why?

rand() can yield RAND_MAX so rand()/RAND_MAX may be one and
int((double(rand())/RAND_MAX) *n) may be n.

What is wanted is a number in the range [0;n) therefore n should be
excluded.

I suppose the range [0, n) is [0, n-1] here. So what if we considered
the formula to be for range [0, n]? Is there some flaw in this?

You mean other than that it grossly non-uniform?

Suppose n = 9. If you use the formula above to create a random decimal digit
in {0,1,2,3,4,5,6,7,8,9}, the probability of getting 9 is 1/(1+RAND_MAX)
which will not be 0.1 unless RAND_MAX=9 (which is rather unlikely).


Best

Kai-Uwe Bux
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top