How do you guys limit rand() results to a range?

Discussion in 'C Programming' started by DFS, Jun 1, 2014.

1. DFSGuest

I've been reading up on rand() for the first time, and found this article:

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx

On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

r = rand() % N
A=1428954
B=1427121
C=1430690
D=1428133
E=1429276
F=1426874
G=1428952
Total=10000000

the article above gives another way
r = rand() / ((RAND_MAX / N) + 1)
A=1426636
B=1430108
C=1427855
D=1428752
E=1429170
F=1428667
G=1428812
Total=10000000

To my uninformed eye, both ways produce well-distributed random results
within a 10 million size test loop.

(intentionally clunky test) code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

char randomLetter(int runType) {

int i, r;
int countLetters = 7;
int Acounter = 0, Bcounter = 0, Ccounter = 0, Dcounter = 0;
int Ecounter = 0, Fcounter = 0, Gcounter = 0;

srand(time(NULL));

for (i=1; i<=10000000;i++)
{
if (runType == 1) {r = rand() % (countLetters);}
if (runType == 2) {r = rand() / ((RAND_MAX / countLetters) + 1);}
if (r == 0) Acounter++;
if (r == 1) Bcounter++;
if (r == 2) Ccounter++;
if (r == 3) Dcounter++;
if (r == 4) Ecounter++;
if (r == 5) Fcounter++;
if (r == 6) Gcounter++;
}

if (runType == 1) printf("r = rand() %% %d\n", countLetters);
if (runType == 2) printf("r = rand() / ((RAND_MAX / %d) + 1)\n",
countLetters);

printf("A=%d\n",Acounter);
printf("B=%d\n",Bcounter);
printf("C=%d\n",Ccounter);
printf("D=%d\n",Dcounter);
printf("E=%d\n",Ecounter);
printf("F=%d\n",Fcounter);
printf("G=%d\n",Gcounter);
printf("Total=%d\n",Acounter+Bcounter+Ccounter+Dcounter+Ecounter+Fcounter+Gcounter);

return 0;
}

int main(void) {
randomLetter(1);
randomLetter(2);
return 0;
}

DFS, Jun 1, 2014

2. Kaz KylhekuGuest

This is not viable if you require an even disribution. However, if you
have such specific requirements of a PRNG, then rand itself isn't viable.

(So basically, in situations where rand is acceptable, modulo
reduction is okay.)
Modulo doesn't do that. Suppose you have a very good source of 8 bit random
numbers in the range [0,256), and you reduce these, say, modulo 200 to the
range in the range [0, 200). The problem is that basically, this operation
chops off the values in the range [200,256) and folds them to the range
[0,56). So these values are twice as likely to occur.

Every modulus M1 which is not a perfect divisor of the original modulus M0 has
this problem to some extent. Of course, if M1 << M0, the problem is
less noticeable. For instance if [0, 256) is reduced into [0, 5), the
discrepancy between the more densely produced values and less densely
produced values is a lot smaller.

Kaz Kylheku, Jun 1, 2014

3. Malcolm McLeanGuest

As Kaz says, normally that's OK. Some really old implementations
of rand() returned alternating bottom bits so %2 in particular
was dicey, but that's rare now.Note that normally you want the range 0 to N-1.I often write

#define uniform() (rand()/(RAND_MAX + 1.0))

which produces a random number on 0 to 1 minus a tiny bit. So N * uniform()
will always produce a number on 0 to N-1.
However mainly I use uniform() for when I want a random number on 0 to 1,
and the modulus method for random integers.

Malcolm McLean, Jun 1, 2014
4. ais523Guest

If your random number generator is sufficiently random, both methods
give the same result; they limit the result to a range (with the higher
values being slightly less likely to come up, if N doesn't divide
exactly into RAND_MAX).

In some C implementations (especially older ones), the random-number
generators in the standard libraries (rand(), etc.) are not particularly
random, especially in the less significant bits; for instance, some of
them alternate between odd and even numbers. The modulus method for
reducing the range makes this problem worse, whereas the division method
makes it less bad.

So some programmers prefer the modulus method because it's clearer; and
some programmers prefer the division method because it produces the same
results if the random number generator works properly, and better
results if it's broken.

ais523, Jun 1, 2014
5. BartCGuest

Try the same code with a much narrow source of random numbers. For example,
use:

#define MYRAND_MAX 15
int myrand(void) { return rand() & MYRAND_MAX;}

and replace uses of rand() with myrand(), and RAND_MAX with MYRAND_MAX. Then
any non-uniform results are more obvious.

(For my purposes I just use the % method but I'm not fussy about
distribution.)

BartC, Jun 1, 2014
6. Stefan RamGuest

When there are M numbers that can be returned by rand(), and
we make N possible results from that by either »%« or »/« or
any other means that will always map a number m to the same
number n and m cannot be divided by n, then some of these n
numbers will be mapped from ( m div n ) numbers, others from
( m div n )+ 1 numbers.

The correct solution is to use calls of rand() to assemble a
number with decimal places between 0 and 1. The more decimal
places, the better, but usually a few calls of rand() will
suffice. The, this number can be scaled (multiplied) to get
the result.

Stefan Ram, Jun 1, 2014
7. Stefan RamGuest

Supersedes: <-berlin.de>

When there are M numbers that can be returned by rand(), and
we make N possible results from that by either »%« or »/« or
any other means, we will always map the same number m to the
same number n. If m cannot be divided by n, then some of
those n numbers will be mapped from ( m div n ) different
rand() results, others not.

The correct solution is to use (possibly: multiple) calls of
rand() to assemble a number with decimal places, so that the
value of this number is between 0 and 1. The more decimal
places, the better, but usually a few calls of rand() will
suffice. Then, this number can be scaled (multiplied) and
truncated to get the result.

Stefan Ram, Jun 1, 2014
8. Malcolm McLeanGuest

A floating point number consists of a sign, a mantissa, and an
exponent. The sign is easy, it's always positive. The mantissa
can be set to random bits. The exponent is tricky. There's a
50% chance it should be in the range 0.5 - 1.0 (so -1, after
debiasing). There's a 25% chance it should be -2, a 12.5%
chance it should be -3, and so on. If you want denormalised
numbers you've then got to apply a little bit more logic,
because you've got to start shifting down the mantissa. But
that's highly unlikely to be ever executed.

So basically you need to create a random bitstream, then
read it until you get the first non-zero bit. That tells you
what the exponent should be. In almost all cases, that's one
call to rand(), then two calls to fill out the mantissa. So
three calls in total for a double resolution random number.

Malcolm McLean, Jun 1, 2014
9. Rosario193Guest

random unsigned in the range [a, b] b>a would be a+rand()%(b-a)
if rand() is statistical ok

Rosario193, Jun 1, 2014
10. Rosario193Guest

if rand() is not statistical ok one can try something as

a+(rand()>>2)%(b-a)

Rosario193, Jun 1, 2014
11. Stefan RamGuest

For a number between 0 and 1, unsigned fixed-point is fine.

The concatenation of "0." with the results of n rand()
calls, each one giving the value of a digit, already is a
fixed-point number between 0 and 1 in the base of RAND_MAX.

(I already wrote C code for that, but can't find it now, and
I do not have time right now to write it anew. Otherwise,
I'd already had posted a demo. You don't actually need to
implement arithmetics in base RAND_MAX, because these
calculations are just the abstract model, but the
implementation can use simplifications with the same effect.)

Stefan Ram, Jun 1, 2014
12. Stefan RamGuest

Assume that in the given application it is acceptable to
wait t seconds for the next random number, then the
possibility that the above procedure will need more time
than t seconds is greater than 0.

It is possible that it will not have terminated before the
end of the universe, albeit by a very little possibility.

Stefan Ram, Jun 1, 2014
13. Keith ThompsonGuest

Is "a number with decimal places" a verbose way of saying
"floating-point value", or do you mean something else?

I don't think that's the best approach. I'll post a followup to

Keith Thompson, Jun 1, 2014
14. Stefan RamGuest

#include <stdio.h>
#include <stdlib.h>

/* returns a uniformly-distributed integral number i,
with min <= i < top and with a number of rand() calls
that is smaller than a certain small number (like 4)
that depends on top-min and RAND_MAX. */

int number( double min, double top )
{ { double step; do
{ step =( top - min )/( RAND_MAX + 1. );
min = min + rand() * step; top = min + step; }
while( min < top ); } return( int )min; }

int main( void )
{ int a[ 51 ];
for( int i = 0; i < 51; ++i )a[ i ]= 0;
for( int i = 0; i < 1e6; ++i )++a[ number( 1, 50 )];
for( int i = 0; i < 51; ++i )
{ printf((( i + 1 )/ 10 * 10 ==( i + 1 ))?"%5d\n" : "%5d ", a[ i ]); }}

Stefan Ram, Jun 1, 2014
15. Keith ThompsonGuest

[...]

The comp.lang.c FAQ is at http://www.c-faq.com/. You've just asked
question 13.16.

As others have pointed out, you can't generate uniformly distributed
numbers in the range 0..N-1 using a single call to rand() if RAND_MAX+1
is not a multiple of N. Some numbers will occur more often than others.
You can play tricks to change *which* numbers occur more often, but
you can't avoid the problem.

If that doesn't matter, the rand() / ((RAND_MAX / N) + 1) approach is
probably the best.

If it does (and it probably should), you can use the method described in
the FAQ, which requires calling rand() multiple times for each result
(the number of calls will average slightly more than 1).

The FAQ mentions that some rand() implementations return results that
alternate odd vs. even, which is why it advises against the rand() % N
method. I've recently learned that some current implementations still
do this (specifically some of the BSD variants, though Mac OS doesn't
have this problem). The sample implementation given in the C standard
does *not* have this problem -- which makes me wonder why the BSD
implementations haven't been updated.

It's worth noting that rand() implementations, even ones without the
above problem, tend not to produce high-quality random numbers. For one
thing, they're constrained by the requirememt that rand() must always
return the same sequence for a given seed.

There are other non-standard functions that are much better. See your

Keith Thompson, Jun 1, 2014
16. Malcolm McLeanGuest

Absolutely don't.

rand() is a bit shuffling function. To get an implementation, look up
an algorithm, and cut and paste the C source.
Then the random number generator works consistently, whatever platform
your program happens to run on.

Malcolm McLean, Jun 1, 2014
17. Keith ThompsonGuest

True.

A concrete example: Suppose RAND_MAX==32767, and you want numbers
in the range 0..9999. Then this method rejects all rand() results
outside the range 0..29999, and then divides by 3. If rand(),
by chance, returns a long sequence of numbers greater than 29999,
this will call rand() arbitrarily many times before yielding a
useful result.

Presumably there are ways to capture and use the randomness of the
rejected numbers, rather than just ignoring them. This requires
substantially more complex code, but at least in principle it can
place an upper bound on the number of rand() calls required for
each result.

This extra work might make sense for a hard real-time system (i.e.,
a system in which late answers are no better than wrong answers)
that requires high-quality random numbers. Of course such a system
probably shouldn't be using rand() in the first place.

There are probably intermediate methods that can limit the number
of rand() calls needed at the cost of some slight loss of uniformity.

Keith Thompson, Jun 1, 2014
18. Ben BacarisseGuest

I don't think this holds for negative arguments.

Ben Bacarisse, Jun 2, 2014
19. Ben BacarisseGuest

To the OP: Malcolm has some rather odd ideas about software.
After checking that you have permission to do that.
It's hard to write fully portable code. Very few people can do it. You
will probably have to check that the code works on your target
implementation. Checking that it really will work consistently across
all platforms is yet more work.

The OP may be happy with a program that works on one platform with no
checking of the source. They may be happy to accept that the call will
have to replaced with that for another platform, should the code ever
have to run there. If the software is well structured, system-specific
code should be easy to replace.

Ben Bacarisse, Jun 2, 2014
20. Ben BacarisseGuest

On a 64-bit system, rand()/(RAND_MAX + 1.0) can be 1.0, so multiplying
by N gives N exactly, maybe leading to some out-of-bounds access.

This potential bug is so common, though, that I can see library authors
will being wary of making RAND_MAX as big as they might otherwise.

Mind you, the 64-bit number space is huge, and on a system with the
usual double precision, the number of problem values that can be
returned by rand() is tiny, so problems will be very rare. But if I
ever have to have a pacemaker fitted, I'd really like to review the code
myself...

<snip>

Ben Bacarisse, Jun 2, 2014