How do you guys limit rand() results to a range?


G

glen herrmannsfeldt

(snip)
The comp.lang.c FAQ is at http://www.c-faq.com/. You've just asked
question 13.16.
As others have pointed out, you can't generate uniformly distributed
numbers in the range 0..N-1 using a single call to rand() if RAND_MAX+1
is not a multiple of N. Some numbers will occur more often than others.
You can play tricks to change *which* numbers occur more often, but
you can't avoid the problem.

Did anyone here ever figure out how many random numbers you need
to look at before the difference is statistically significant?

For one, if you have N random numbers between 0 and N-1, what
is the probability that all N are generated? (Assume that the
generator is actually random.)

If you flip a coin N times, what is the probability of getting N/2
heads. (Note that it is zero for all odd N.)

-- glen
 
Ad

Advertisements

K

Keith Thompson

Malcolm McLean said:
Absolutely don't.

rand() is a bit shuffling function. To get an implementation, look up
an algorithm, and cut and paste the C source.
Then the random number generator works consistently, whatever platform
your program happens to run on.

There's nothing wrong with using a library function in the usual
manner, by calling it rather than by recompiling it from source.

If you need to use it on a platform where it's not already available,
then yes, recompiling from source is a decent approach (assuming
the source code itself is portable).
 
K

Keith Thompson

glen herrmannsfeldt said:
Did anyone here ever figure out how many random numbers you need
to look at before the difference is statistically significant?

I haven't, but for some values of N the difference can be *very*
significant.

One extreme case is RAND_MAX==32767 and N==32768. All but one of
the 32769 values you want (assuming a decent rand() implementation)
will appear equally often; the one remaining value, whichever one
it is, will never appear. Or, with RAND_MAX==32767 and N==32766,
one value will appear twice as often as the others. With slightly
smaller N, several values will appear twice as often as the others.
For one, if you have N random numbers between 0 and N-1, what
is the probability that all N are generated? (Assume that the
generator is actually random.)

If you flip a coin N times, what is the probability of getting N/2
heads. (Note that it is zero for all odd N.)

That's not hard to solve, but I'll leave it to someone else.
 
J

James Kuyper

(snip)



Did anyone here ever figure out how many random numbers you need
to look at before the difference is statistically significant?

At best, with R = RAND_MAX/N, you'll have some numbers generated with a
frequency that is (R+1)/R greater than others. If you know which numbers
fall into each category, detecting that difference would require on the
order of R^2 samples - which isn't a spectacularly large number by the
standards of serious scientific simulations. A much larger sample size
would be needed if determining which numbers fall into each category is
part of what needs to be detected.
For other, less demanding purposes, the uneven distribution is probably
not much of a problem, at least for sufficiently large values of R.
 
M

Malcolm McLean

It's hard to write fully portable code. Very few people can do it.
It is yes. As you saw, there was a portability bug in something as
simple as uniform().
You will probably have to check that the code works on your target
implementation. Checking that it really will work consistently across
all platforms is yet more work.
An unfortunate reality.

The OP may be happy with a program that works on one platform with no
checking of the source. They may be happy to accept that the call will
have to replaced with that for another platform, should the code ever
have to run there. If the software is well structured, system-specific
code should be easy to replace.
Hmm.
"Someone please write an random number generator, in C, with equivalent
statistical properties to a unix random()". It depends what environment you're
working in, that could be easy, or it could be very difficult.
 
B

Ben Bacarisse

Malcolm McLean said:
Hmm.
"Someone please write an random number generator, in C, with equivalent
statistical properties to a unix random()". It depends what environment you're
working in, that could be easy, or it could be very difficult.

The requirement will be to find (not necessarily write) a random number
generator that works where the original one did. This need not have the
same statistical properties. The minimum requirements might be rather
modest. You don't know if the OP is implementing an X.509 certification
authority or a live desktop wallpaper with randomly twinkling stars.
 
Ad

Advertisements

M

Malcolm McLean

The requirement will be to find (not necessarily write) a random number
generator that works where the original one did. This need not have the
same statistical properties. The minimum requirements might be rather
modest. You don't know if the OP is implementing an X.509 certification
authority or a live desktop wallpaper with randomly twinkling stars.
We know that rand() isn't good enough, so either he's very fussy about
the aesthetic properties of the twinkling stars, or it's the former.

So it's a lot easier to have an RNG which is a bit-shuffing function, known
to produce identical results wherever we run it, shipped with the
program as portable ANSI C source, and not in the same source file
as an IO procedure which accesses /dev/random.
Then as long as the company has the skills to compile C source, they
should have the skills to use the program provided.
 
W

Walter Banks

DFS said:
I've been reading up on rand() for the first time, and found this article:

http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx

On web, saw lots of recommendations to use modulo
to limit the result set from 0 to N.

r = rand() % N
A=1428954
B=1427121
C=1430690
D=1428133
E=1429276
F=1426874
G=1428952
Total=10000000

the article above gives another way
r = rand() / ((RAND_MAX / N) + 1)
A=1426636
B=1430108
C=1427855
D=1428752
E=1429170
F=1428667
G=1428812
Total=10000000

To my uninformed eye, both ways produce well-distributed random results
within a 10 million size test loop.

I did some work on random numbers used in a Monte Carlo
simulator. The one thing to remember is, "A random number
generator will try very hard to not be random".

To test randomness is very difficult. Randomness has many
properties. It starts with linear distribution and then distribution
of numbers in pairs and so on.

One good book on random number and random number
statistical distributions is "Statistical Distributions" available
in a number editions by Hastings and Peacock et al

w..
 
R

Richard Bos

Malcolm McLean said:
Absolutely don't.

rand() is a bit shuffling function. To get an implementation, look up
an algorithm, and cut and paste the C source.
Then the random number generator works consistently, whatever platform
your program happens to run on.

Absolutely don't.

If you need high quality random numbers (and you probably don't),
picking a random (PI) one from the net and cut'n'pasting it without
understanding it thoroughly is a very bad idea. For starters, it may not
even be portable to the platform you're porting to! But if you need
HQRN, then you probably already know a better algorithm, and may even
have been given one in the program specification.

If you need medium quality random numbers (which you may), most systems
now have random number functions which are quite adequate. These are
much more likely to be well-debugged than one which you pulled at random
from the net, and almost certain to give better performance on your
implementation, with which, after all, it came packaged.

If you only need low quality random numbers (which most of us do, most
of the time), rand(), on a decent implementation, will be good enough.

Richard
 
R

Richard Bos

Malcolm McLean said:
We know that rand() isn't good enough, so either he's very fussy about
the aesthetic properties of the twinkling stars, or it's the former.

No, we don't. To quote:

# To my uninformed eye, both ways produce well-distributed random
# results within a 10 million size test loop.

That sounds to me like he thinks rand() _is_ good enough for him, and
he's just wondering what the fuss is all about.

He's probably right, too.

Richard
 
B

Ben Bacarisse

Malcolm McLean said:
We know that rand() isn't good enough, so either he's very fussy about
the aesthetic properties of the twinkling stars, or it's the former.

Who's "we", and how do "we" know this? I don't recall the OP ever saying
that rand() is not good enough.

<snip>
 
Ad

Advertisements

D

DFS

No, we don't. To quote:

# To my uninformed eye, both ways produce well-distributed random #
results within a 10 million size test loop.

That sounds to me like he thinks rand() _is_ good enough for him,
and he's just wondering what the fuss is all about.

It's definitely good enough for the task that launched my exploration of
random number generation in C (randomly pick a name from a list of 10).

He's probably right, too.

I enjoyed the discussion, and did some further reading. Looks like the
Mersenne Twister is far better than rand().

Here's an interesting site:
http://www.random.org/
 
B

Ben Bacarisse

DFS said:
I enjoyed the discussion, and did some further reading. Looks like
the Mersenne Twister is far better than rand().

rand() may be a Mersenne Twister implementation (though the API will
hamper the seeding).
 
J

James Kuyper

On 6/2/2014 10:23 AM, Richard Bos wrote: ....

It's definitely good enough for the task that launched my exploration of
random number generation in C (randomly pick a name from a list of 10).

That depends entirely upon what the randomly selected name is to be used
for. You can't determine whether a given approach produces sufficiently
equal probabilities, without first knowing precisely why it it (or is
not) important that the probabilities of the different choices be equal.
 
K

Keith Thompson

Walter Banks said:
I did some work on random numbers used in a Monte Carlo
simulator. The one thing to remember is, "A random number
generator will try very hard to not be random".
[...]

Can you expand on that? I'm having trouble thinking of an
interpretation of that last statement that makes sense.
What am I missing?
 
M

Martin Shobe

(snip)



Did anyone here ever figure out how many random numbers you need
to look at before the difference is statistically significant?

I don't think there's a simple answer, it's going to depend rather
heavily on the test being performed.

For example. If we know which ones are bad, we can design our test as a
proportion. Here we will use N = 10000 and RAND_MAX = 32767. There
should be 2768 numbers that are more likely than the remaining 7232. If
the numbers were correctly distributed we would expect this proportion
to be .2768 while the actual proportion is .337890625. You only need 344
samples to detect this with a confidence of 95% and a power of 80%. (For
those who don't know, the confidence here is the chance of a sample from
the correctly distributed numbers to pass the test while the power is
the chance of the incorrectly distributed numbers to fail the test. A
sample passes the test if we consider it to be correctly distributed).

On the other hand, if the incorrectly distributed numbers are evenly
split between the two groups, the actual proportion is now
..295654296875. You would need a sample size of 3492 to detect that
difference with a confidence of 95% and a power of 80%.
For one, if you have N random numbers between 0 and N-1, what
is the probability that all N are generated? (Assume that the
generator is actually random.)
N!/(N^N).

If you flip a coin N times, what is the probability of getting N/2
heads. (Note that it is zero for all odd N.)

if N is odd, 0.
if N is even, N!/(2^N(N/2)!(N/2)!).

Martin Shobe
 
Ad

Advertisements

R

Richard Bos

Keith Thompson said:
A concrete example: Suppose RAND_MAX==32767, and you want numbers
in the range 0..9999. Then this method rejects all rand() results
outside the range 0..29999, and then divides by 3. If rand(),
by chance, returns a long sequence of numbers greater than 29999,
this will call rand() arbitrarily many times before yielding a
useful result.

Presumably there are ways to capture and use the randomness of the
rejected numbers, rather than just ignoring them. This requires
substantially more complex code, but at least in principle it can
place an upper bound on the number of rand() calls required for
each result.

I've looked into this, but I don't think it's possible, at least not
perfectly in the general case. You just get the same problem, recursive.

To take the simplest possible example, suppose you take one single bit
from each rejected number[1], and string those into a new random number.
You still have the same problem that _this_ number may be greater than
29999. You can then reject it and start constructing a new one, but
there is no way, with a truly uniform RNG, to guarantee that you will
construct any acceptable number in any length of time.
You simply can't construct a completely random number out of bits that
is less than X, unless X is a power of two. And bits are all we have to
work with.
There are probably intermediate methods that can limit the number
of rand() calls needed at the cost of some slight loss of uniformity.

Certainly. Even the above method gives you the option of having a tiny
chance of never finishing (only with a perfect RNG), or an epsilon-but-
not-zero difference from uniformity.

Of course, most RNGs (and all PRNGs, which is what we're really talking
about) _are not_ perfectly uniform, _will_ cycle through all possible
outputs sooner or later (ideally several times in different orders), and
therefore _will_ produce an acceptable output well before the cows come
home. So in practice that's all right unless security reasons demand
that the timing between random numbers is perfectly uniform - in which
case, you shouldn't be taking hints from this thread, but giving them
(and probably using completely different methods).

Richard

[1] Not the most significant one, you clot!
 
K

Keith Thompson

Keith Thompson said:
A concrete example: Suppose RAND_MAX==32767, and you want numbers
in the range 0..9999. Then this method rejects all rand() results
outside the range 0..29999, and then divides by 3. If rand(),
by chance, returns a long sequence of numbers greater than 29999,
this will call rand() arbitrarily many times before yielding a
useful result.

Presumably there are ways to capture and use the randomness of the
rejected numbers, rather than just ignoring them. This requires
substantially more complex code, but at least in principle it can
place an upper bound on the number of rand() calls required for
each result.

I've looked into this, but I don't think it's possible, at least not
perfectly in the general case. You just get the same problem, recursive.

To take the simplest possible example, suppose you take one single bit
from each rejected number[1], and string those into a new random number.
You still have the same problem that _this_ number may be greater than
29999. You can then reject it and start constructing a new one, but
there is no way, with a truly uniform RNG, to guarantee that you will
construct any acceptable number in any length of time.
You simply can't construct a completely random number out of bits that
is less than X, unless X is a power of two. And bits are all we have to
work with.

I think you're right.

If RAND_MAX==32767 then each call gives you exactly 15 bits of
randomness. In principle, you should be able to treat a sequence
of results as a continuous stream of randomness from which you can
extract random numbers in any range you choose, using exactly as
many bits as you need (not necessarily a whole number) for each one.
In practice, though, since we can only store and manipulate discrete
values (even in floating-point), I think any method must still be
vulnerable to an unlikely sequence that would cause it to break.

I suppose you could use arbitrary-precision rational numbers to store
the pending randomness, but then you lose performance and risk running
out of memory. (I don't seriously suggest that as a solution.)

[...]
 
G

glen herrmannsfeldt

Keith Thompson said:
Walter Banks said:
I did some work on random numbers used in a Monte Carlo
simulator. The one thing to remember is, "A random number
generator will try very hard to not be random". [...]

Can you expand on that? I'm having trouble thinking of an
interpretation of that last statement that makes sense.
What am I missing?

Not completely understanding it either, it reminds me of what Knuth
says about RNGs in TAOCP vol. 2, (and, being some time since I last
read it) that you don't want to randomly select the algorithm for
a RNG.

In other words, you don't want randomness in the algorithm itself.

It might be human (hopefully not CS) intuition that more
randomness in the algorithm would generate more random results.

Seems to me that Knuth and Walter disagree.

-- glen
 
Ad

Advertisements

I

Ike Naar

It might be human (hopefully not CS) intuition that more
randomness in the algorithm would generate more random results.

What's the abbreviation CS? (common sense? computing science?)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top