rand - is RAND_MAX how long before the same # will occur again?

J

Jack

When I use rand(), is the RAND_MAX value how long I am guaranteed that
the same value will not appear twice?

And is this a floating window?

For example, if RAND_MAX is 32767, and I make 500,000 consecutive rand
calls then is the rand() algorithm going to guarantee me that no
floating window of calls over that 500,000 rand calls will have the
same value twice? The only way that would work is if the series was
identical everytime.

I just found some code that someone wrote that uses rand() to create a
unique id, and I'm thinking it would make more sense just to take a 32
bit int, and increment it everytime with a conditional check that
restarts it at 1 when the max value for the int is reached. I cannot
see any obvious reason why rand() was used instead of this approach.
 
B

Ben Pfaff

Jack said:
When I use rand(), is the RAND_MAX value how long I am guaranteed that
the same value will not appear twice?

No. It is very unlikely that no value will appear twice. If the
random number generator is uniformly random, the chance that no
value will appear twice in RAND_MAX calls, assuming RAND_MAX ==
32767, is

(32766 / 32767) * (32765 / 32767) * ... * (1 / 32767)
== 32766! / (32767)**32766
~= 1.347 * 10**-14228 (according to Emacs calc)

This result is much, much smaller than the maximum value for
LDBL_MIN.

(Someone should check my math, I'm not so good at this. Also,
smartasses, the above are math formulas, not C expressions.)
 
B

Barry Schwarz

When I use rand(), is the RAND_MAX value how long I am guaranteed that
the same value will not appear twice?

No, if the numbers are random then it is entirely possible for the
first two to be the same.

RAND_MAX is simply the largest value that rand can return.
And is this a floating window?

What is a floating window?
For example, if RAND_MAX is 32767, and I make 500,000 consecutive rand
calls then is the rand() algorithm going to guarantee me that no
floating window of calls over that 500,000 rand calls will have the
same value twice? The only way that would work is if the series was
identical everytime.

I don't know what you mean here but if you make 500,000 calls then on
average each value will occur 16-17 times.
I just found some code that someone wrote that uses rand() to create a
unique id, and I'm thinking it would make more sense just to take a 32
bit int, and increment it everytime with a conditional check that
restarts it at 1 when the max value for the int is reached. I cannot
see any obvious reason why rand() was used instead of this approach.

Ask the author.


<<Remove the del for email>>
 
O

osmium

Jack said:
I just found some code that someone wrote that uses rand() to create a
unique id, and I'm thinking it would make more sense just to take a 32
bit int, and increment it everytime with a conditional check that
restarts it at 1 when the max value for the int is reached. I cannot
see any obvious reason why rand() was used instead of this approach.

My guess: the author goofed, it's a relatively easy mistake to make. A
reason that has at least *some* plausibility: Information hiding, he didn't
want the users to glean any information from their relative numbers.
 
E

Eric Sosman

Ben said:
No. It is very unlikely that no value will appear twice. If the
random number generator is uniformly random, the chance that no
value will appear twice in RAND_MAX calls, assuming RAND_MAX ==
32767, is

(32766 / 32767) * (32765 / 32767) * ... * (1 / 32767)
== 32766! / (32767)**32766
~= 1.347 * 10**-14228 (according to Emacs calc)

This result is much, much smaller than the maximum value for
LDBL_MIN.

(Someone should check my math, I'm not so good at this. Also,
smartasses, the above are math formulas, not C expressions.)

FWIW, a different tool gives me the same result. If we're
wrong, I at least am in good company.

However, this calculation assumes successive rand() values
are independent, which is certainly not the case. rand() is
required to be deterministic in the sense that it produces the
exact same sequence of values for a given explicit or implied
srand() argument; the srand() argument completely determines
the sequence of subsequent rand() values.

Drifting into implementation specifics, it is also worth
noting that if rand() is a full-period linear congruential
generator it produces a permutation of { 0 .. RAND_MAX }, so
the probability of a repeated value in RAND_MAX calls or even
in RAND_MAX+1 calls is zero! Of course, the probability of a
repetition in RAND_MAX+2 calls rises abruptly to unity. For
a pure multiplicative generator with prime modulus, the generated
values are a permutation of { 1 .. RAND_MAX } and the probability
transition lies between RAND_MAX and RAND_MAX+1 calls.

For the benefit of the O.P. (Ben already knows this), the
Standard does not specify what algorithm underlies rand(), and
different implementations use different generators. It is
certainly an error to assume that rand() will not repeat itself
within RAND_MAX calls; it is even possible that two rand() calls
in a row can return the same value.
 
K

Keith Thompson

Jack said:
When I use rand(), is the RAND_MAX value how long I am guaranteed that
the same value will not appear twice?

You're not. If the values returned by rand() were truly random, there
would be a 1.0/(RAND_MAX+1) chance that two successive calls to rand()
would yield the same value.

Of course rand() returns pseudo-random values, and the sequence is
required to be reproducible for a given seed (the argument passed to
srand().

If the internal state is no bigger than the value returned (e.g., if
RAND_MAX is 32767 and the system stores only 15 bits of internal
state), then rand() can never return the same result twice -- if it
did, it would continue to return that same result indefinitely. In
that case, the results will repeat with a cycle of *at most*
RAND_MAX+1.

If it keeps a larger internal state, the results are going to look
more like real random numbers, though of course they'll still be
deterministic. For example, if there are 1024 bits of internal state,
that makes 2**1024 possible states; srand() lets you select one of
UINT_MAX+1 of those states.

So, depending on the implementation, two successive calls to rand()
might never return the same value, or they might do so once in
RAND_MAX+1 calls (periodically or probabilistically), or they might do
something else if the implementation is more "pseudo" than "random".

[snip]
I just found some code that someone wrote that uses rand() to create a
unique id, and I'm thinking it would make more sense just to take a 32
bit int, and increment it everytime with a conditional check that
restarts it at 1 when the max value for the int is reached. I cannot
see any obvious reason why rand() was used instead of this approach.

If all you care about is uniqueness, incrementing a variable is easier
and more effective than using rand(). If you also want
unpredictability, you can use a random number, but you have to allow
for the possibility of repeated values, probably by keeping track of
all the ids you've already used. RAND_MAX can be as small as 32767,
so it's not useful if you want a large number of unique ids. Finally,
rand() is often not very good, and can be easy to predict if you know
the algorithm; it's almost certainly not suitable for cryptographic
applications.
 
P

pete

Eric said:
It is
certainly an error to assume that rand() will not repeat itself
within RAND_MAX calls; it is even possible that two rand() calls
in a row can return the same value.

It is even possible that *all* rand() calls can return the same value.
 
K

Keith Thompson

pete said:
It is even possible that *all* rand() calls can return the same value.

C99 7.20.2p2:

The rand function computes a sequence of pseudo-random integers in
the range 0 to RAND_MAX.

I don't think you can stretch the meaning of "pseudo-random" to
include a repeated sequence of the same value.

(On the other hand, if it generated truly random values, there would
be a finite probability that it could produce an arbitrarily long
sequence of a single value.)
 
P

pete

Keith said:
C99 7.20.2p2:

The rand function computes a sequence of pseudo-random integers in
the range 0 to RAND_MAX.

I don't think you can stretch the meaning of "pseudo-random" to
include a repeated sequence of the same value.

(On the other hand, if it generated truly random values, there would
be a finite probability that it could produce an arbitrarily long
sequence of a single value.)

It's a quality of implementation issue,
like a malloc that always returns NULL.
There's nothing in the standard which prohibits
a very poor quality rand().
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top