How do you guys limit rand() results to a range?


W

Walter Banks

Keith said:
Walter Banks said:
I did some work on random numbers used in a Monte Carlo
simulator. The one thing to remember is, "A random number
generator will try very hard to not be random".
[...]

Can you expand on that? I'm having trouble thinking of an
interpretation of that last statement that makes sense.
What am I missing?

I know several people that for various reasons have wanted
to modify a random number by limiting to a range or a
particular distribution. Most attempts to do that as Kaz's
earlier comment on modulus pointed out has un intended
side effects that seriously disturbs the resulting expectation.

The same thing has been observed by most people trying to
implement random number generators with specific outputs.
It is someone else's quote but I have forgotten who should be
credited.

BTW range is just one specific case of random numbers
with specific distributions.

What we did to create a random number for a range in the MC
simulator was first create a dependable random number generator
with a distribution from 0..1 and then for a range used

range(n,m) = n + ((m-n) * rand0_1);

A random number with a distribution of 0..1 can be a linear
integer random number but treated like a fractional number.
Do a integer multiply of (m-n) by the random number and
discard the least significant bits equal to the fraction width.
There are a number of easy ways that this can be done in C
to avoid any floating point operations. (double the width of the
int for the multiply or do a fixed point fract by int multiply)

w..
 
Ad

Advertisements

D

DFS

To my uninformed eye, both ways produce well-distributed random results
within a 10 million size test loop.

Modulo doesn't do that. Suppose you have a very good source of 8 bit random
numbers in the range [0,256), and you reduce these, say, modulo 200 to the
range in the range [0, 200). The problem is that basically, this operation
chops off the values in the range [200,256) and folds them to the range
[0,56). So these values are twice as likely to occur.

Can you write a C program to demonstrate that?


Every modulus M1 which is not a perfect divisor of the original modulus M0 has
this problem to some extent. Of course, if M1 << M0, the problem is
less noticeable. For instance if [0, 256) is reduced into [0, 5), the
discrepancy between the more densely produced values and less densely
produced values is a lot smaller.
 
K

Kaz Kylheku

To my uninformed eye, both ways produce well-distributed random results
within a 10 million size test loop.

Modulo doesn't do that. Suppose you have a very good source of 8 bit random
numbers in the range [0,256), and you reduce these, say, modulo 200 to the
range in the range [0, 200). The problem is that basically, this operation
chops off the values in the range [200,256) and folds them to the range
[0,56). So these values are twice as likely to occur.

Can you write a C program to demonstrate that?

Sure, like this:

#include <stdio.h>

int main(void)
{
int freq[200] = { 0 };
int i;

/* Sweep over a precisely even distribution of values from 0 to 255:
namely, a linear ramp. */

for (i = 0; i < 256; i++)
freq[i % 200]++;

/* Now print the frequency of the modulo 200 values: some occurred
twice. The values add up to 256, but are unevenly distributed. */

for (i = 0; i < 200; i++)
printf("freq[%d] = %d\n", i, freq);

return 0;
}
 
N

Nobody

Some really old implementations of rand() returned alternating bottom
bits so %2 in particular was dicey, but that's rare now.

If the least-significant bits of an LCG are used, using the result modulo
2^N for any small N is dicey.

It's not just that the bottom bit will alternate 0,1,0,1,..., but the
bottom 2 bits will follow a cycle of length 4, the bottom 3 bits a cycle
of length 8, and so on.

Some PRNGs intentionally returned the entire state of the LCG so that the
programmer could make the call as to how many lower bits should be
discarded. Which is fine if the programmer knows the issues, but rather
undesirable if they are likely to assume that the values can be treated as
if they were random.
 
P

Phil Carmody

I've looked into this, but I don't think it's possible, at least not
perfectly in the general case. You just get the same problem, recursive.

It's actually quite simple, and doesn't have to be recursive,
you can simply store the unused entropy and carry it on into
the next calculation in an iterative manner. (A bit's a bit's
a bit.)

There are even many ways to do this. The simplest appear to be
recursive, as they never reduce the problem to zero, just to
something neglibile. The ones with no waste alas require more
numerics for their explanation, and I'm shattered, and haven't
needed to code one for a decade, I'd probably mess up an
explanation. Think radix conversion, that's the best hint I
can give presently. Alternatively, think arithmetic coding,
which is probably a better hint as there are "doesn't go on
for ever before making a decision" proofs for that.
To take the simplest possible example, suppose you take one single bit
from each rejected number[1], and string those into a new random number.
You still have the same problem that _this_ number may be greater than
29999. You can then reject it and start constructing a new one, but
there is no way, with a truly uniform RNG, to guarantee that you will
construct any acceptable number in any length of time.

Even using that naive method, you can bound the mean. And if you
feed back the waste back into the system (a bit of entropy's a bit
of entropy, it never actually has to be waste), the average number
of iterations can be kept very low no matter what the ranges are
that you're talking about.
You simply can't construct a completely random number out of bits that
is less than X, unless X is a power of two. And bits are all we have to
work with.

Only if you are looking at worst case rather than average.

Phil
 
P

Phil Carmody

Absolutely don't.

With the implicit blindness in MM's post, absolutely agreed.
If you need high quality random numbers (and you probably don't),
picking a random (PI) one from the net and cut'n'pasting it without
understanding it thoroughly is a very bad idea. For starters, it may not
even be portable to the platform you're porting to! But if you need
HQRN, then you probably already know a better algorithm, and may even
have been given one in the program specification.

If you need medium quality random numbers (which you may), most systems
now have random number functions which are quite adequate. These are
much more likely to be well-debugged than one which you pulled at random
from the net, and almost certain to give better performance on your
implementation, with which, after all, it came packaged.

However, I agree with MM in avoiding system PRNGs is you care
about quality.

Go to the literature. Any post-Marsaglia era paper, containing
algorithms which can compare favourably to GM's KISS variants
should be a good starting point. Most I've seen have been 100%
C99, all operations fully defined by the standard. And therefore
portable to anywhere you can find a modern-enough compiler.

(And by saying post-Marsaglia, I don't mean to besmirch George
in his absense. George wiped pretty much everything prior off
the table with his work, and so there's little point in considering
pre-Marsaglia. He set the modern standards to beat.)
If you only need low quality random numbers (which most of us do, most
of the time), rand(), on a decent implementation, will be good enough.

I hate to say it, but occasionally I do use rand(). But I know
it's really crappy. Sometimes really crappy is actually good
enough.

If I actually care one whit about randomness, however, rand()
doesn't even enter into my field of view.

Phil
 
Ad

Advertisements

P

Phil Carmody

Keith Thompson said:
I haven't, but for some values of N the difference can be *very*
significant.

One extreme case is RAND_MAX==32767 and N==32768. All but one of
the 32769 values you want

I think you have a fencepost error in N.
(assuming a decent rand() implementation)
will appear equally often; the one remaining value, whichever one
it is, will never appear.

So roll twice.
We know (a^2-b^2)=(a+b)(a-b), so 32686^2 = (32768+1)*(32768-1) + 1
Almost never a need to repeat (one time in a billion).

Phil
 
J

James Kuyper

(e-mail address removed) (Richard Bos) writes: ....

Only if you are looking at worst case rather than average.

What if the worst case actually matters (as can happen for real-time
processes)?
I'll grant you, as somebody has already pointed, "Real Time" and an
exactly even probability distribution is a very unlikely combination of
needs. However, if it does come up, can it be met?
 
K

Keith Thompson

Phil Carmody said:
It's actually quite simple, and doesn't have to be recursive,
you can simply store the unused entropy and carry it on into
the next calculation in an iterative manner. (A bit's a bit's
a bit.)

Yes, a bit's a bit's a bit, but we're talking about fractional bits
here.

With RAND_MAX==32767, rand() gives you 15 bits; for a number in the
range 0..9999, you need approximately 13.2877 (log 10000 / log 2) bits
per result.

[...]
Only if you are looking at worst case rather than average.

I think the worst case was exactly what we were looking at.

If you're not worried about the worst case, simply discarding
out-of-range results may be good enough (unless rand() is very slow).
 
S

Stefan Ram

James Kuyper said:
I'll grant you, as somebody has already pointed, "Real Time" and an
exactly even probability distribution is a very unlikely combination of
needs. However, if it does come up, can it be met?

When one is using »real« random numbers, an exactly
even probability distribution is very unlikely.

BTW: Only since as recent as 2010 we have

»evidence that quantum randomness is indeed
incomputable. That means that it could not
have been be generated by a computer.«

http://www.technologyreview.com/blog/arxiv/25041/

»Ref: arxiv.org/abs/1004.1521:

Experimental Evidence of Quantum Randomness
Incomputability«

http://arxiv.org/abs/1004.1521
 
J

James Kuyper

When one is using �real� random numbers, an exactly
even probability distribution is very unlikely.

Agreed - that's one reason why, for certain purposes, sufficiently
sophisticated pseudo-random number generators can actually be more
useful than really-random numbers, because it's easier to control the
distribution (and it's a lot easier to repeat a sequence).
BTW: Only since as recent as 2010 we have

�evidence that quantum randomness is indeed
incomputable. That means that it could not
have been be generated by a computer.�

http://www.technologyreview.com/blog/arxiv/25041/

�Ref: arxiv.org/abs/1004.1521:

Experimental Evidence of Quantum Randomness
Incomputability�

http://arxiv.org/abs/1004.1521

As someone with a stronger academic background in theoretical physics
than in computer science, I don't find the summaries of those articles
particularly surprising. Physicists have long assumed that quantum
randomness is real, it's not just a way of describing our uncertainty
about the values of "hidden variables". Any pseudo-random generator
necessarily uses the equivalent of what physicists are talking about
when they speak of "hidden variables". Experimental confirmation is nice
to have, but not surprising (assuming those articles are correct - I'm
agnostic on that issue).

Those articles also don't seem particularly connected to your comments
in your first paragraph - were they intended to be?
 
Ad

Advertisements

R

Rosario193

I've been reading up on rand() for the first time, and found this article:

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx


On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

r = rand() % N
A=1428954
B=1427121

random unsigned in the range [a, b] b>a would be a+rand()%(b-a)

if the range is [a, b] b>a and b!=unsigned max = (unsigned) -1
if(b<a) return -1;
if(b-a>RAND_MAX) return -1;
else return a+rand()%(b-a);
 
R

Rosario193

I've been reading up on rand() for the first time, and found this article:

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx


On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

r = rand() % N
A=1428954
B=1427121

random unsigned in the range [a, b] b>a would be a+rand()%(b-a)

if the range is [a, b] b>a and b!=unsigned max = (unsigned) -1
if(b<a) return -1;
if(b-a>RAND_MAX) return -1;
else return a+rand()%(b-a);
if rand() is not statistical ok one can try something as

a+(rand()>>2)%(b-a)

it is easy

[0, RAND_MAX] [a, b]

step=Rand_Max/(b-a)

x=rand()

x<= step return a
x<=2*step return a+1
x<=3*step return a+2
.....
x<=b*(b-a)/RandMax * step return b
 
R

Rosario193

it is easy

[0, RAND_MAX] [a, b]

step=Rand_Max/(b-a)

x=rand()

x<= step return a
x<=2*step return a+1
x<=3*step return a+2
....
x<=b*(b-a)/RandMax * step return b

the last line should be

x<=(b-a) * step return b

so would be something as

x=rand(); y=x/step; return y;
 
R

Rosario193

it is easy
[0, RAND_MAX] [a, b]

step=Rand_Max/(b-a)

x=rand()

x<= step return a
x<=2*step return a+1
x<=3*step return a+2
....
x<=b*(b-a)/RandMax * step return b

the last line should be

x<=(b-a) * step return b

so would be something as

x=rand(); y=x/step; return y;

x=rand(); y=a+x/step; return y;

[RandMax/step= RandMax * (b-a)/RandMax=b-a ]
 
R

Rosario193

I've been reading up on rand() for the first time, and found this article:

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx


On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

r = rand() % N
A=1428954
B=1427121

random unsigned in the range [a, b] b>a would be a+rand()%(b-a)
if rand() is statistical ok
this not add anything

if [a,b] b>a is the range
there are II Ways
a+rand()%(b-a)

if step=RAND_MAX/(b-a) seems in the test ok
x=a+rand()/step
if(x>b) return b
too
 
Ad

Advertisements

R

Rosario193

I've been reading up on rand() for the first time, and found this article:

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx


On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

r = rand() % N
A=1428954
B=1427121
C=1430690
D=1428133
E=1429276
F=1426874
G=1428952
Total=10000000


the article above gives another way
r = rand() / ((RAND_MAX / N) + 1)

Why is it "rand()/((RAND_MAX/N) + 1)" and not rand()/(RAND_MAX/N)

if whe have 7 letters

0 1 2 3 4 5 6 7
1 2 3 4 5 6 7

there are 7 intervals and not 8
 
R

Rosario193

Why is it "rand()/((RAND_MAX/N) + 1)" and not rand()/(RAND_MAX/N)

if whe have 7 letters

0 1 2 3 4 5 6 7
1 2 3 4 5 6 7

there are 7 intervals and not 8

i make some error all ok

7 letters

0 1 2 3 4 5 6
1 2 3 4 5 6 7

7 intervals
 
Ad

Advertisements

R

Rosario193

I've been reading up on rand() for the first time, and found this article:

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx


On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

r = rand() % N
A=1428954
B=1427121
C=1430690
D=1428133
E=1429276
F=1426874
G=1428952
Total=10000000


the article above gives another way
r = rand() / ((RAND_MAX / N) + 1)

the last question:

Why is it "rand()/((RAND_MAX/N) + 1)" and not rand()/(RAND_MAX/N)

if whe have 7 letters

0 1 2 3 4 5 6
1 2 3 4 5 6

there are 6 intervals and not 7
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top