rand() % n Revisited

R

Rich Fife

Quick rand() question:

I know you're not supposed to use "rand() % 1024" for instance,
because it focuses on the lower bits. However, it seems to me that
given that the argument is not a power of two (or near a power of
two), that this is not an issue. The upper bits will participate
equally in the result with the lower. Am I missing something?

Thanks!

-- Rich --
 
P

Paul Hsieh

Quick rand() question:

I know you're not supposed to use "rand() % 1024" for instance,
because it focuses on the lower bits. However, it seems to me
that given that the argument is not a power of two (or near a
power of two), that this is not an issue. The upper bits will
participate equally in the result with the lower. Am I missing
something?

Yes, you are missing some mathematical analysis to back up what you
just said. If you do (rand() % 1023) on Microsoft Visual C++ or
WATCOM C/C++, 32 of the possible outputs will have an extra 3% bias no
matter how good your random number generator is. No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

The C.L.C. FAQ about this gives extremely misleading advice on this
point and it should seriously be ignored. If you want to seriously
deal with random numbers just read my page about it:

http://www.pobox.com/~qed/random.html

I build up a *REAL* ranged random number generator with reasonable
performance characteristics, culminating in the randrange() function
that removes all the primary problems with ranged random numbers. If
you want something with pure random bit quality you can always use the
Mersenne Twister or Fortuna as a base for my generator function.
 
L

lawrence.jones

Paul Hsieh said:
No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

Then you've never seen the truely bad BSD rand() that fostered most of
the paranoia about rand. With it, rand() % 1 generated the sequence 0,
1, 0, 1, 0, 1, ....
 
K

Kelsey Bjarnason

Then you've never seen the truely bad BSD rand() that fostered most of
the paranoia about rand. With it, rand() % 1 generated the sequence 0,
1, 0, 1, 0, 1, ....

That would be bad, as rand() % 1 should only ever produce 0.
 
C

CBFalconer

Then you've never seen the truely bad BSD rand() that fostered
most of the paranoia about rand. With it, rand() % 1 generated
the sequence 0, 1, 0, 1, 0, 1, ....

FYI the value of "rand() % 1" is identically 0. Except it may be
considerably slower than just writing "0".
 
P

Phil Carmody

Keith Thompson said:
Obviously Larry is using a font that doesn't distinguish clearly
enough between '%' and '&'. Yeah, that's it.

I had presumed that ``const int l=2;'' was the line before.

Obfuscatorially yours,
Phil
 
C

Charles Richmond

CBFalconer said:
FYI the value of "rand() % 1" is identically 0. Except it may be
considerably slower than just writing "0".

Not if your compiler has a good optimizer... ;-)
 
K

Keith Thompson

Charles Richmond said:
Not if your compiler has a good optimizer... ;-)

Maybe. rand() has side effects, so a call to it can't be optimized
away, *unless* the compiler can prove that the side effects don't
affect the program's output.
 
W

William Hughes

Yes, you are missing some mathematical analysis to back up what you
just said. If you do (rand() % 1023) on Microsoft Visual C++ or
WATCOM C/C++, 32 of the possible outputs will have an extra 3% bias no
matter how good your random number generator is.

Well, 3% certainly meets the informal meaning of small. If your
problem
is such that you are worried about the 3% you should probably be more
worried about the fact that the rand() you are using has an output
space of only 16 bits.
No C compiler's
rand() that I have ever seen has, by itself, a worse effect on random
output than that.

Then you have not seen a rand() implementation that switched parity
on each call. I understand such an implementation not only existed
but
was relatively widespread.
The C.L.C. FAQ about this gives extremely misleading advice on this
point and it should seriously be ignored.

No. Using the advice given in the C.L.C. means that you will get
reasonable
results, even if rand() implementation is poor, as long as rand()
produces
integers that are more or less uniformly distributed in 0...RAND_MAX.
If you use the "rand() % n" technique you have no such guarantee.
The bias is small. Do not confuse detectablity with importance.
(The use of "signifcance" in the term "statistical significance"
leads many people astray).

- William Hughes
 
N

Nate Eldredge

William Hughes said:
Then you have not seen a rand() implementation that switched parity
on each call. I understand such an implementation not only existed
but
was relatively widespread.

Right you are. Here is the rand() implementation from the very
influential 4.4BSD-Lite.

#define RAND_MAX 0x7fffffff

static u_long next = 1;

int
rand()
{
return ((next = next * 1103515245 + 12345) % ((u_long)RAND_MAX + 1));
}

void
srand(seed)
u_int seed;
{
next = seed;
}
 
P

Paul Hsieh

Well, 3% certainly meets the informal meaning of small.

You might like to explain that to the average casino. Card counting
in black
jack gives player around a 1% advantage over the house, and the
casinos kick
out such people whenever they are discovered. People who attack
defective
casino games rely on people with attitudes like yours.

Even if you are implementing something as simple as a 1d20 (where each
choice
itself is only 5%) in a dungeons and dragons game, the players will
easily see
that bias over time.
[...] If your
problem is such that you are worried about the 3% you should probably
be more worried about the fact that the rand() you are using has an
output space of only 16 bits.

That statement doesn't follow any line of logic of any relevance. If
you
care, then you care, and you want to get a correct ranged random
number
generator. If you go up to 32 bits, but still have bias that's just
a
little smaller, how can you be happy? And if you want to write
portable
code, then what are you going to do?
Then you have not seen a rand() implementation that switched parity
on each call.  I understand such an implementation not only existed
but was relatively widespread.

True enough, but this fundamentally comes from the lack of analysis.
The
C.L.C. FAQ just continues this tradition by failing to give effective
analysis of the problem.
No.  Using the advice given in the C.L.C.  means that you will get
reasonable results, even if rand() implementation is poor, as long
as rand() produces integers that are more or less uniformly
distributed in 0...RAND_MAX.

Did you know that a simple counter will produce numbers that are
exactly
uniformly distributed in 0 ... RAND_MAX? You know, basic
understanding is
sometimes actually useful on occasion.
If you use the "rand() % n" technique you have no such guarantee.

The technique shown in the CLC FAQ also has no such guarantee. Its
totally besides the point.
The bias is small.

Define small. If you want to test how often a hash function will map
to
a common bucket either (rand() % n) or
(rand() * (double) n / (RAND_MAX + 1)) will make no difference. It
will
produce worthless results no matter what.
[...] Do not confuse detectablity with importance.

I assure you, I am not the one confused. The C.L.C. FAQ is giving a
solution that assumes a policy where low bit determinism is a worse
problem than pure measurable bias and also a worse problem than a
simple
range issue. In fact the CLC FAQ is promoting confusion by not
explaining the issue correctly and consequently how one might deal
with
the problem.
(The use of "significance" in the term "statistical significance"
leads many people astray).

What has that got to do with anything? If you wish to test something
with a very small probability which is lower than the bias being
introduced by such short-sighted techniques then what good is the
C.L.C.
FAQs discussion on the subject?

Who actually wants to use a PRNG which is biased or incapable of even
measuring what you want? The subject deserves to be discussed
usefully.
The C.L.C. just harps one single anomaly that has resulted for the
weakness of the ANSI C standard.
 
R

Richard Bos

Paul Hsieh said:
The C.L.C. FAQ about this gives extremely misleading advice on this
point and it should seriously be ignored. If you want to seriously
deal with random numbers just read my page about it:

Or possibly don't. If the code on that page is as rotten as the HTML, I
wouldn't trust it.

Then, search this and other newsgroups for posts by George Marsaglia,
who _really_ knows what a good PRNG is like.

Richard
 
R

Richard Bos

Paul Hsieh said:
You might like to explain that to the average casino.

Any casino which uses Pseudo-, as opposed to real, Random Numbers is
begging for bankruptcy anyway, no matter how good the Pseudo.
The C.L.C. just harps one single anomaly that has resulted for the
weakness of the ANSI C standard.

You have a bee in your bonnet. It's your perfect right to enjoy the
buzzing sound, but please don't let it sting the newbies.

Richard
 
W

William Hughes

You might like to explain that to the average casino.


The explanation goes "Even differences that are usually
considered small, e.g. 3%. can be very very important."
If the casino is still in business they will tell
me to teach my Grandmother to suck eggs.


Card counting
in black
jack gives player around a 1% advantage over the house,

Teach your Grandmother to suck eggs
and the casinos kick
out such people whenever they are discovered. People who attack
defective casino games rely on people with attitudes like yours.

Even if you are implementing something as simple as a 1d20 (where each
choice
itself is only 5%) in a dungeons and dragons game, the players will
easily see
that bias over time.

Piffle (even if we are talking about a 3% bias and not
the less than .1% bias you get with rand()%20 ).
And even if someone took the trouble to notice (e.g. tabulated
1000's of rolls and applied statistical techniques) they would notice
that the bias had no practical import.
[...] If your
problem is such that you are worried about the 3% you should probably
be more worried about the fact that the rand() you are using has an
output space of only 16 bits.

That statement doesn't follow any line of logic of any relevance. If
you
care, then you care, and you want to get a correct ranged random
number
generator. If you go up to 32 bits, but still have bias that's just
a
little smaller, how can you be happy?


E.g. I am interested at tail distributions. The performance of
my random generator has gone from terrible to reasonable.
And if you want to write portable code, then what are you going to do?

Either I don't need much, in which case I can use
the system rand() or I provide my_rand().
True enough, but this fundamentally comes from the lack of analysis.
The
C.L.C. FAQ just continues this tradition by failing to give effective
analysis of the problem.



Did you know that a simple counter will produce numbers that are
exactly
uniformly distributed in 0 ... RAND_MAX?


Indeed, one needs more than uniformly distributed.
The basic point, that the rand() implementation needs
to be really bad to produce unreasonable results
with the FAQ technique, but the rand() implementation
only needs to be a bit bad to produce unreasonable
results with the rand()%n technique remains.


You know, basic
understanding is
sometimes actually useful on occasion.


The technique shown in the CLC FAQ also has no such guarantee. Its
totally besides the point.


Define small. If you want to test how often a hash function will map
to
a common bucket either (rand() % n) or
(rand() * (double) n / (RAND_MAX + 1)) will make no difference. It
will
produce worthless results no matter what.

No. A test that looks for perfection vs bias, will find a bias,
but since there are lots and lots of ways of introducing a
insignficant (note I did _not_ say "statistically insignificant")
bias, a test that looks for perfection vs bias is stupid.

[...] Do not confuse detectablity with importance.

I assure you, I am not the one confused. The C.L.C. FAQ is giving a
solution that assumes a policy where low bit determinism is a worse
problem than pure measurable bias and also a worse problem than a
simple
range issue. In fact the CLC FAQ is promoting confusion by not
explaining the issue correctly and consequently how one might deal
with
the problem.
(The use of "significance" in the term "statistical significance"
leads many people astray).

What has that got to do with anything? If you wish to test something
with a very small probability which is lower than the bias being
introduced by such short-sighted techniques then what good is the
C.L.C.
FAQs discussion on the subject?

Who actually wants to use a PRNG which is biased

[I recall a wonderful poem about an archer who claimed
he was best, because, although he never came near
the target, he was unbiased, Lack of bias is not
everything!]


Lots of people don't care a fig. If I want to shuffle cards for a
bridge game
then I don't care about a 3% bias. (If I want to shuffle cards for
a computer poker game, then the fact that the average rand()
is about as cryptographicly secure as a Ceasar cypher is more
important than a 3% bias). If the wumpus alternates between
being in the left half of the maze and the right half of the maze,
I care a lot!

The CLC FAQ solves a real (although probably now historical) problem.

The system rand() may not be suitable for many applicatins.
Fixing one problem, which is not a problem in most applications
where the system rand() is suitable, does not magically make rand()
produce high quality random numbers.

- William Hughes
 
U

user923005

The CLC FAQ solves a real (although probably now historical) problem.

The system rand() may not be suitable for many applicatins.
Fixing one problem, which is not a problem in most applications
where the system rand() is suitable, does not magically make rand()
produce high quality random numbers.

An easy solution for this problem is to use the Mersenne Twister.
True, the quality of C compiler packaged rand() implementations is
spotty.
Equally true, the excellence of the open source Mersenne Twister is
well documented.

Suggestions to the ANSI C committee:
1. Make a version of the Mersenne Twister the standard C library
implementation.
2. Have versions that produce int (0-INT_MAX), long long (0-
LLONG_MAX), and double (0.0-1.0) outputs.

A final suggestion would be to carefully analyze available C library
source to find a "best of breed" class of standard library functions
{for the subset of functions that are fully portable} and have this
code base become a default library implementation. Then, when the
compiler vendor wants to write a snappy compiler, the workload will
consist only of files that improve upon the default set.
 
P

Paul Hsieh

My understanding is that slot machines do in fact use pseudorandom generators.

In fact *EVERY* electronic casino game (including video poker, for
example) uses a pseudorandom generator. Obviously, I have not checked
every one to be sure, so how can I make such a statement?

This is the real crime being perpetrated by the CLC FAQ -- its
teaching and fostering ignorance. A correctly used PRNG (i.e., with a
sufficient dose of entropy) is, of course, the *BEST POSSIBLE* source
of random numbers (better than observing background radiation, or
whatever other crazy schemes people have dreamed up to try to obtain
"pure random" when they don't know the difference between bias and
determinism) and its also ridiculously cheap. You just have to know
what you are doing.

It doesn't take much to understand why this is the case. But of
course, some people think that ignorance and misinformation is a
better way of understanding random numbers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top