How do you guys limit rand() results to a range?

D

DFS

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx

On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

r = rand() % N
A=1428954
B=1427121
C=1430690
D=1428133
E=1429276
F=1426874
G=1428952
Total=10000000

the article above gives another way
r = rand() / ((RAND_MAX / N) + 1)
A=1426636
B=1430108
C=1427855
D=1428752
E=1429170
F=1428667
G=1428812
Total=10000000

To my uninformed eye, both ways produce well-distributed random results
within a 10 million size test loop.

(intentionally clunky test) code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

char randomLetter(int runType) {

int i, r;
int countLetters = 7;
int Acounter = 0, Bcounter = 0, Ccounter = 0, Dcounter = 0;
int Ecounter = 0, Fcounter = 0, Gcounter = 0;

srand(time(NULL));

for (i=1; i<=10000000;i++)
{
if (runType == 1) {r = rand() % (countLetters);}
if (runType == 2) {r = rand() / ((RAND_MAX / countLetters) + 1);}
if (r == 0) Acounter++;
if (r == 1) Bcounter++;
if (r == 2) Ccounter++;
if (r == 3) Dcounter++;
if (r == 4) Ecounter++;
if (r == 5) Fcounter++;
if (r == 6) Gcounter++;
}

if (runType == 1) printf("r = rand() %% %d\n", countLetters);
if (runType == 2) printf("r = rand() / ((RAND_MAX / %d) + 1)\n",
countLetters);

printf("A=%d\n",Acounter);
printf("B=%d\n",Bcounter);
printf("C=%d\n",Ccounter);
printf("D=%d\n",Dcounter);
printf("E=%d\n",Ecounter);
printf("F=%d\n",Fcounter);
printf("G=%d\n",Gcounter);
printf("Total=%d\n",Acounter+Bcounter+Ccounter+Dcounter+Ecounter+Fcounter+Gcounter);

return 0;
}

int main(void) {
randomLetter(1);
randomLetter(2);
return 0;
}

K

Kaz Kylheku

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx

On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

This is not viable if you require an even disribution. However, if you
have such specific requirements of a PRNG, then rand itself isn't viable.

(So basically, in situations where rand is acceptable, modulo
reduction is okay.)
r = rand() % N
A=1428954
B=1427121
C=1430690
D=1428133
E=1429276
F=1426874
G=1428952
Total=10000000

the article above gives another way
r = rand() / ((RAND_MAX / N) + 1)
A=1426636
B=1430108
C=1427855
D=1428752
E=1429170
F=1428667
G=1428812
Total=10000000

To my uninformed eye, both ways produce well-distributed random results
within a 10 million size test loop.

Modulo doesn't do that. Suppose you have a very good source of 8 bit random
numbers in the range [0,256), and you reduce these, say, modulo 200 to the
range in the range [0, 200). The problem is that basically, this operation
chops off the values in the range [200,256) and folds them to the range
[0,56). So these values are twice as likely to occur.

Every modulus M1 which is not a perfect divisor of the original modulus M0 has
this problem to some extent. Of course, if M1 << M0, the problem is
less noticeable. For instance if [0, 256) is reduced into [0, 5), the
discrepancy between the more densely produced values and less densely
produced values is a lot smaller.

M

Malcolm McLean

On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.
As Kaz says, normally that's OK. Some really old implementations
of rand() returned alternating bottom bits so %2 in particular
was dicey, but that's rare now.Note that normally you want the range 0 to N-1.I often write

#define uniform() (rand()/(RAND_MAX + 1.0))

which produces a random number on 0 to 1 minus a tiny bit. So N * uniform()
will always produce a number on 0 to N-1.
However mainly I use uniform() for when I want a random number on 0 to 1,
and the modulus method for random integers.

A

ais523

DFS said:

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx

On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N. [snip]
the article above gives another way
r = rand() / ((RAND_MAX / N) + 1) [snip]
To my uninformed eye, both ways produce well-distributed random results
within a 10 million size test loop.

If your random number generator is sufficiently random, both methods
give the same result; they limit the result to a range (with the higher
values being slightly less likely to come up, if N doesn't divide
exactly into RAND_MAX).

In some C implementations (especially older ones), the random-number
generators in the standard libraries (rand(), etc.) are not particularly
random, especially in the less significant bits; for instance, some of
them alternate between odd and even numbers. The modulus method for
reducing the range makes this problem worse, whereas the division method

So some programmers prefer the modulus method because it's clearer; and
some programmers prefer the division method because it produces the same
results if the random number generator works properly, and better
results if it's broken.

B

BartC

DFS said:
To my uninformed eye, both ways produce well-distributed random results
within a 10 million size test loop.
char randomLetter(int runType) {
if (runType == 1) {r = rand() % (countLetters);}
if (runType == 2) {r = rand() / ((RAND_MAX / countLetters) + 1);}
if (runType == 1) printf("r = rand() %% %d\n", countLetters);
if (runType == 2) printf("r = rand() / ((RAND_MAX / %d) + 1)\n",
countLetters);

Try the same code with a much narrow source of random numbers. For example,
use:

#define MYRAND_MAX 15
int myrand(void) { return rand() & MYRAND_MAX;}

and replace uses of rand() with myrand(), and RAND_MAX with MYRAND_MAX. Then
any non-uniform results are more obvious.

(For my purposes I just use the % method but I'm not fussy about
distribution.)

S

Stefan Ram

DFS said:
To my uninformed eye, both ways produce well-distributed random results
within a 10 million size test loop.

When there are M numbers that can be returned by rand(), and
we make N possible results from that by either »%« or »/« or
any other means that will always map a number m to the same
number n and m cannot be divided by n, then some of these n
numbers will be mapped from ( m div n ) numbers, others from
( m div n )+ 1 numbers.

The correct solution is to use calls of rand() to assemble a
number with decimal places between 0 and 1. The more decimal
places, the better, but usually a few calls of rand() will
suffice. The, this number can be scaled (multiplied) to get
the result.

S

Stefan Ram

Supersedes: <[email protected]>

To my uninformed eye, both ways produce well-distributed random results
within a 10 million size test loop.

When there are M numbers that can be returned by rand(), and
we make N possible results from that by either »%« or »/« or
any other means, we will always map the same number m to the
same number n. If m cannot be divided by n, then some of
those n numbers will be mapped from ( m div n ) different
rand() results, others not.

The correct solution is to use (possibly: multiple) calls of
rand() to assemble a number with decimal places, so that the
value of this number is between 0 and 1. The more decimal
places, the better, but usually a few calls of rand() will
suffice. Then, this number can be scaled (multiplied) and
truncated to get the result.

M

Malcolm McLean

The correct solution is to use (possibly: multiple) calls of
rand() to assemble a number with decimal places, so that the
value of this number is between 0 and 1. The more decimal
places, the better, but usually a few calls of rand() will
suffice. Then, this number can be scaled (multiplied) and
truncated to get the result.
A floating point number consists of a sign, a mantissa, and an
exponent. The sign is easy, it's always positive. The mantissa
can be set to random bits. The exponent is tricky. There's a
50% chance it should be in the range 0.5 - 1.0 (so -1, after
debiasing). There's a 25% chance it should be -2, a 12.5%
chance it should be -3, and so on. If you want denormalised
numbers you've then got to apply a little bit more logic,
because you've got to start shifting down the mantissa. But
that's highly unlikely to be ever executed.

So basically you need to create a random bitstream, then
read it until you get the first non-zero bit. That tells you
what the exponent should be. In almost all cases, that's one
call to rand(), then two calls to fill out the mantissa. So
three calls in total for a double resolution random number.

R

Rosario193

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx

On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

r = rand() % N
A=1428954
B=1427121

random unsigned in the range [a, b] b>a would be a+rand()%(b-a)
if rand() is statistical ok

R

Rosario193

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx

On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

r = rand() % N
A=1428954
B=1427121

random unsigned in the range [a, b] b>a would be a+rand()%(b-a)
if rand() is statistical ok

if rand() is not statistical ok one can try something as

a+(rand()>>2)%(b-a)

S

Stefan Ram

Malcolm McLean said:
A floating point number consists of a sign, a mantissa, and an

For a number between 0 and 1, unsigned fixed-point is fine.

The concatenation of "0." with the results of n rand()
calls, each one giving the value of a digit, already is a
fixed-point number between 0 and 1 in the base of RAND_MAX.

(I already wrote C code for that, but can't find it now, and
I do not have time right now to write it anew. Otherwise,
implement arithmetics in base RAND_MAX, because these
calculations are just the abstract model, but the
implementation can use simplifications with the same effect.)

S

Stefan Ram

Richard Damon said:
The method I prefer to get really uniform numbers between 0 and N-1 is
to find a k such that k*N <= M (k = trunc(M/N) works well), and if the
results of rand are greater than or equal to k*N, get another random
number, else divide by k and return the results.

Assume that in the given application it is acceptable to
wait t seconds for the next random number, then the
possibility that the above procedure will need more time
than t seconds is greater than 0.

It is possible that it will not have terminated before the
end of the universe, albeit by a very little possibility.

K

Keith Thompson

Supersedes: <[email protected]>

When there are M numbers that can be returned by rand(), and
we make N possible results from that by either Â»%Â« or Â»/Â« or
any other means, we will always map the same number m to the
same number n. If m cannot be divided by n, then some of
those n numbers will be mapped from ( m div n ) different
rand() results, others not.

The correct solution is to use (possibly: multiple) calls of
rand() to assemble a number with decimal places, so that the
value of this number is between 0 and 1. The more decimal
places, the better, but usually a few calls of rand() will
suffice. Then, this number can be scaled (multiplied) and
truncated to get the result.

Is "a number with decimal places" a verbose way of saying
"floating-point value", or do you mean something else?

I don't think that's the best approach. I'll post a followup to

S

Stefan Ram

(I already wrote C code for that, but can't find it now, and
I do not have time right now to write it anew. Otherwise,

#include <stdio.h>
#include <stdlib.h>

/* returns a uniformly-distributed integral number i,
with min <= i < top and with a number of rand() calls
that is smaller than a certain small number (like 4)
that depends on top-min and RAND_MAX. */

int number( double min, double top )
{ { double step; do
{ step =( top - min )/( RAND_MAX + 1. );
min = min + rand() * step; top = min + step; }
while( min < top ); } return( int )min; }

int main( void )
{ int a[ 51 ];
for( int i = 0; i < 51; ++i )a[ i ]= 0;
for( int i = 0; i < 1e6; ++i )++a[ number( 1, 50 )];
for( int i = 0; i < 51; ++i )
{ printf((( i + 1 )/ 10 * 10 ==( i + 1 ))?"%5d\n" : "%5d ", a[ i ]); }}

K

Keith Thompson

DFS said:

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx

On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

r = rand() % N [...]
the article above gives another way
r = rand() / ((RAND_MAX / N) + 1)
[...]

The comp.lang.c FAQ is at http://www.c-faq.com/. You've just asked
question 13.16.

As others have pointed out, you can't generate uniformly distributed
numbers in the range 0..N-1 using a single call to rand() if RAND_MAX+1
is not a multiple of N. Some numbers will occur more often than others.
You can play tricks to change *which* numbers occur more often, but
you can't avoid the problem.

If that doesn't matter, the rand() / ((RAND_MAX / N) + 1) approach is
probably the best.

If it does (and it probably should), you can use the method described in
the FAQ, which requires calling rand() multiple times for each result
(the number of calls will average slightly more than 1).

The FAQ mentions that some rand() implementations return results that
alternate odd vs. even, which is why it advises against the rand() % N
method. I've recently learned that some current implementations still
do this (specifically some of the BSD variants, though Mac OS doesn't
have this problem). The sample implementation given in the C standard
does *not* have this problem -- which makes me wonder why the BSD
implementations haven't been updated.

It's worth noting that rand() implementations, even ones without the
above problem, tend not to produce high-quality random numbers. For one
thing, they're constrained by the requirememt that rand() must always
return the same sequence for a given seed.

There are other non-standard functions that are much better. See your

M

Malcolm McLean

There are other non-standard functions that are much better. See your
Absolutely don't.

rand() is a bit shuffling function. To get an implementation, look up
an algorithm, and cut and paste the C source.
Then the random number generator works consistently, whatever platform
your program happens to run on.

K

Keith Thompson

Assume that in the given application it is acceptable to
wait t seconds for the next random number, then the
possibility that the above procedure will need more time
than t seconds is greater than 0.

It is possible that it will not have terminated before the
end of the universe, albeit by a very little possibility.

True.

A concrete example: Suppose RAND_MAX==32767, and you want numbers
in the range 0..9999. Then this method rejects all rand() results
outside the range 0..29999, and then divides by 3. If rand(),
by chance, returns a long sequence of numbers greater than 29999,
this will call rand() arbitrarily many times before yielding a
useful result.

Presumably there are ways to capture and use the randomness of the
rejected numbers, rather than just ignoring them. This requires
substantially more complex code, but at least in principle it can
place an upper bound on the number of rand() calls required for
each result.

This extra work might make sense for a hard real-time system (i.e.,
a system in which late answers are no better than wrong answers)
that requires high-quality random numbers. Of course such a system
probably shouldn't be using rand() in the first place.

There are probably intermediate methods that can limit the number
of rand() calls needed at the cost of some slight loss of uniformity.

B

Ben Bacarisse

#include <stdio.h>
#include <stdlib.h>

/* returns a uniformly-distributed integral number i,
with min <= i < top and with a number of rand() calls
that is smaller than a certain small number (like 4)
that depends on top-min and RAND_MAX. */

I don't think this holds for negative arguments.
int number( double min, double top )
{ { double step; do
{ step =( top - min )/( RAND_MAX + 1. );
min = min + rand() * step; top = min + step; }
while( min < top ); } return( int )min; }

int main( void )
{ int a[ 51 ];
for( int i = 0; i < 51; ++i )a[ i ]= 0;
for( int i = 0; i < 1e6; ++i )++a[ number( 1, 50 )];
for( int i = 0; i < 51; ++i )
{ printf((( i + 1 )/ 10 * 10 ==( i + 1 ))?"%5d\n" : "%5d ", a[ i ]); }}

B

Ben Bacarisse

Malcolm McLean said:
Absolutely don't.

To the OP: Malcolm has some rather odd ideas about software.
rand() is a bit shuffling function. To get an implementation, look up
an algorithm, and cut and paste the C source.

After checking that you have permission to do that.
Then the random number generator works consistently, whatever platform
your program happens to run on.

It's hard to write fully portable code. Very few people can do it. You
will probably have to check that the code works on your target
implementation. Checking that it really will work consistently across
all platforms is yet more work.

The OP may be happy with a program that works on one platform with no
checking of the source. They may be happy to accept that the call will
have to replaced with that for another platform, should the code ever
have to run there. If the software is well structured, system-specific
code should be easy to replace.

B

Ben Bacarisse

Malcolm McLean said:
As Kaz says, normally that's OK. Some really old implementations
of rand() returned alternating bottom bits so %2 in particular
was dicey, but that's rare now.
Note that normally you want the range 0 to N-1.
I often write

#define uniform() (rand()/(RAND_MAX + 1.0))

which produces a random number on 0 to 1 minus a tiny bit. So N * uniform()
will always produce a number on 0 to N-1.

On a 64-bit system, rand()/(RAND_MAX + 1.0) can be 1.0, so multiplying
by N gives N exactly, maybe leading to some out-of-bounds access.

This potential bug is so common, though, that I can see library authors
will being wary of making RAND_MAX as big as they might otherwise.

Mind you, the 64-bit number space is huge, and on a system with the
usual double precision, the number of problem values that can be
returned by rand() is tiny, so problems will be very rare. But if I
ever have to have a pacemaker fitted, I'd really like to review the code
myself...

<snip>