Way for computing random primes in standard C.

J

Jordan Abel

Rod Pemberton wrote: [...]
"As I've stated previously, the randomness is in the non-perfect algorithm
in rand(). But, the set of numbers generated by rand() is affected by
srand().

No, it's not. There is only ONE sequence.
srand() doesn't affect the randomness of values that rand()
generates, it only changes the set of generated numbers.

It does not, as there is only ONE sequence. What srand() does
is change your position in the sequence.

No, that's not correct.

C99 7.20.2.2p2 says:

The srand function uses the argument as a seed for a new sequence
of pseudo-random numbers to be returned by subsequent calls to
rand. If srand is then called with the same seed value, the
sequence of pseudo-random numbers shall be repeated. If rand is
called before any calls to srand have been made, the same sequence
shall be generated as when srand is first called with a seed value
of 1.

Note the phrase "a new sequence". It's possible, but not required,
that the "new sequence" matches some other sequence at a different
position. It's also possible that it doesn't. For example, there may
be no overlap between the sequence created by srand(1) and the one
created by srand(2) (this is particularly likely if rand()'s internal
state has more bits than the unsigned int argument to srand()).

Not quite - Let's suppose, naively, that srand() merely initializes a
particular set of bits of rand()'s internal state, and sets the
remaining bits to zero. Now, if at any point in srand(1)'s sequence, the
internal state contains the result of srand(2) [reasonable if it's
"perfect" i.e. cycles through every possible combination of bits the
internal state could have], srand(1) and srand(2) are thus two different
points in the same sequence.
 
K

Keith Thompson

Jordan Abel said:
C99 7.20.2.2p2 says:

The srand function uses the argument as a seed for a new sequence
of pseudo-random numbers to be returned by subsequent calls to
rand. If srand is then called with the same seed value, the
sequence of pseudo-random numbers shall be repeated. If rand is
called before any calls to srand have been made, the same sequence
shall be generated as when srand is first called with a seed value
of 1.

Note the phrase "a new sequence". It's possible, but not required,
that the "new sequence" matches some other sequence at a different
position. It's also possible that it doesn't. For example, there may
be no overlap between the sequence created by srand(1) and the one
created by srand(2) (this is particularly likely if rand()'s internal
state has more bits than the unsigned int argument to srand()).

Not quite - Let's suppose, naively, that srand() merely initializes a
particular set of bits of rand()'s internal state, and sets the
remaining bits to zero. Now, if at any point in srand(1)'s sequence, the
internal state contains the result of srand(2) [reasonable if it's
"perfect" i.e. cycles through every possible combination of bits the
internal state could have], srand(1) and srand(2) are thus two different
points in the same sequence.

Yes, I think you're right.

Suppose rand() has N bits of internal state. If it cycles through all
2**N possible internal states, then you can think of the set of all
states as a closed loop with 2**N nodes; each call to rand() advances
on position along the loop, and each call to srand() jumps to a single
point on the loop. (Ideally the points for different values of
srand() are roughly equally spaced along the loop, maximizing the
uniqueness of each subsequence.)

If the cycle is shorter than 2**N, then the states form some number of
disjoint loops rather than one single loop. A call to rand() would
advance by one position on a single loop, and a call to srand() could
jump to an arbitrary point on any loop.

But as you point out, an RNG that cycles through its entire state is
probably better than one that doesn't (but one that doesn't is
certainly permitted by the standard).
 
J

John F

Keith said:
C99 7.20.2.2p2 says:

The srand function uses the argument as a seed for a new sequence
of pseudo-random numbers to be returned by subsequent calls to
rand. If srand is then called with the same seed value, the
sequence of pseudo-random numbers shall be repeated. If rand is
called before any calls to srand have been made, the same sequence
shall be generated as when srand is first called with a seed value
of 1.

Note the phrase "a new sequence". It's possible, but not required,
that the "new sequence" matches some other sequence at a different
position. It's also possible that it doesn't. For example, there may
be no overlap between the sequence created by srand(1) and the one
created by srand(2) (this is particularly likely if rand()'s internal
state has more bits than the unsigned int argument to srand()).

When it's stated that
"...the Mersenne Twister as the core generator.
It produces 53-bit precision floats and has a period of 2**19937-1."

Doesn't it mean that there are 2**19937-1 states and if one were
to actually call it 2**19937 times, then one would be right back
where one started regardless of what number srand() was called with?

Practically: Yes. Did you actually calculate this _high_ exponential?
for each 3.32 in the exponent you may approximately add 1 to an exponent of
10 ...
so you end up with a number of 6000+ Digits.
I was not implying that srand(1) and srand(2) have adjacent states.
But doesn't periodic mean that at whatever state # srand(1) starts in,
it must eventually reach the same state that srand(2) starts in?
Exactly.

So IF the PRNG is periodic AND the same seed gives the same
sequence THEN sequences generated by diffrent srand() calls
must overlap. If wrong, why?

On the long run I can see this too. Imagine a cycle with two different
starting points...
The standard doen't imply that the PRNG must be periodic, does it?
But are there any puely mathematical ones that aren't?

PRNG implys imperfection. Use Blum-Blum-Shub
(http://en.wikipedia.org/wiki/Blum_Blum_Shub) or the Mersenne Twister.
Finite precision will always lead to cycles. Period. Somewhen in a very long
run you'll produce the exactly same value that you started off with...
Mathematically you could even use the logistic equation with infinite
precision, of course.

regards
John
 
F

Flash Gordon

Keith said:
Jordan Abel said:
C99 7.20.2.2p2 says:

The srand function uses the argument as a seed for a new sequence
of pseudo-random numbers to be returned by subsequent calls to
rand. If srand is then called with the same seed value, the
sequence of pseudo-random numbers shall be repeated. If rand is
called before any calls to srand have been made, the same sequence
shall be generated as when srand is first called with a seed value
of 1.

Note the phrase "a new sequence". It's possible, but not required,
that the "new sequence" matches some other sequence at a different
position. It's also possible that it doesn't. For example, there may
be no overlap between the sequence created by srand(1) and the one
created by srand(2) (this is particularly likely if rand()'s internal
state has more bits than the unsigned int argument to srand()).
Not quite - Let's suppose, naively, that srand() merely initializes a
particular set of bits of rand()'s internal state, and sets the
remaining bits to zero. Now, if at any point in srand(1)'s sequence, the
internal state contains the result of srand(2) [reasonable if it's
"perfect" i.e. cycles through every possible combination of bits the
internal state could have], srand(1) and srand(2) are thus two different
points in the same sequence.

Yes, I think you're right.

Suppose rand() has N bits of internal state. If it cycles through all
2**N possible internal states, then you can think of the set of all
states as a closed loop with 2**N nodes; each call to rand() advances
on position along the loop, and each call to srand() jumps to a single
point on the loop. (Ideally the points for different values of
srand() are roughly equally spaced along the loop, maximizing the
uniqueness of each subsequence.)

If the cycle is shorter than 2**N, then the states form some number of
disjoint loops rather than one single loop. A call to rand() would
advance by one position on a single loop, and a call to srand() could
jump to an arbitrary point on any loop.

But as you point out, an RNG that cycles through its entire state is
probably better than one that doesn't (but one that doesn't is
certainly permitted by the standard).

Here's an idea. Isn't Pi meant to be a non-repeating series of digits?
See http://www.piworld.de/pi-statistics/ So how about for your PRNG
calculating the digits of Pi in base RAND_MAX+1?

srand could then just select where to start.

This would not be good enough for security, since if anyone knows that
it is producing the digits of Pi all they have to do is identify where
in the sequence you are, but for other uses it should produce an
infinite (as far as we know) sequence of numbers with very good
statistical properties.

It might also not be the most efficient method of producing random numbers.

Note that I am *not* an expert on statistics, Pi, or PRNGs.
 
B

Ben Bacarisse

The problem is that getting independent sources of entropy usually
involves getting your hands dirty. time (NULL) and getpid () are two
common sources, but that's only 64 bits worth.

time and getpid will give you way less than 64 bits of entropy. The
exact amount will depend on the program and system execution pattern so I
would not like to hazard a guess, but I'd be surprised if together you
could get more than a handful of bits of entropy from them.
Other common practical
sources are things like the value of clock() in response to input device
interrupts (like keyboard or mouse inputs.)

<OT>Many systems have a driver that "harvests" the entropy from such
events. If you don't have this, hashing the system's process table
might be easier. said:
As one final comment -- using the ANSI C's rand() is bad because the
state size is so small

I don't think the standard mandates any state size, does it? I don't
think there is nothing to stop rand() being a very high quality generator.
 
J

Jordan Abel

Keith said:
Jordan Abel said:
C99 7.20.2.2p2 says:

The srand function uses the argument as a seed for a new sequence
of pseudo-random numbers to be returned by subsequent calls to
rand. If srand is then called with the same seed value, the
sequence of pseudo-random numbers shall be repeated. If rand is
called before any calls to srand have been made, the same sequence
shall be generated as when srand is first called with a seed value
of 1.

Note the phrase "a new sequence". It's possible, but not required,
that the "new sequence" matches some other sequence at a different
position. It's also possible that it doesn't. For example, there may
be no overlap between the sequence created by srand(1) and the one
created by srand(2) (this is particularly likely if rand()'s internal
state has more bits than the unsigned int argument to srand()).
Not quite - Let's suppose, naively, that srand() merely initializes a
particular set of bits of rand()'s internal state, and sets the
remaining bits to zero. Now, if at any point in srand(1)'s sequence, the
internal state contains the result of srand(2) [reasonable if it's
"perfect" i.e. cycles through every possible combination of bits the
internal state could have], srand(1) and srand(2) are thus two different
points in the same sequence.

Yes, I think you're right.

Suppose rand() has N bits of internal state. If it cycles through all
2**N possible internal states, then you can think of the set of all
states as a closed loop with 2**N nodes; each call to rand() advances
on position along the loop, and each call to srand() jumps to a single
point on the loop. (Ideally the points for different values of
srand() are roughly equally spaced along the loop, maximizing the
uniqueness of each subsequence.)

If the cycle is shorter than 2**N, then the states form some number of
disjoint loops rather than one single loop. A call to rand() would
advance by one position on a single loop, and a call to srand() could
jump to an arbitrary point on any loop.

But as you point out, an RNG that cycles through its entire state is
probably better than one that doesn't (but one that doesn't is
certainly permitted by the standard).

Here's an idea. Isn't Pi meant to be a non-repeating series of digits?
See http://www.piworld.de/pi-statistics/ So how about for your PRNG
calculating the digits of Pi in base RAND_MAX+1?

srand could then just select where to start.

This would not be good enough for security, since if anyone knows that
it is producing the digits of Pi all they have to do is identify where
in the sequence you are, but for other uses it should produce an
infinite (as far as we know) sequence of numbers with very good
statistical properties.

It might also not be the most efficient method of producing random numbers.

If RAND_MAX+1 is a power of 2, it could be done (based on the fact that
a formula exists to independently calculate hex digits of pi)

You'd probably want srand() to initialize the upper bits of the state
[said state would be an index into the "digits" in that case] rather
than the lower ones.
Note that I am *not* an expert on statistics, Pi, or PRNGs.

So basically neither of us knows how good it would really be.
 
W

Walter Roberson

When it's stated that
"...the Mersenne Twister as the core generator.
It produces 53-bit precision floats and has a period of 2**19937-1."
Doesn't it mean that there are 2**19937-1 states and if one were
to actually call it 2**19937 times, then one would be right back
where one started regardless of what number srand() was called with?

Not necessarily.
I was not implying that srand(1) and srand(2) have adjacent states.
But doesn't periodic mean that at whatever state # srand(1) starts in,
it must eventually reach the same state that srand(2) starts in?

Not necessarily.

It would be possible to the initial seed to affect parameters
in the PRNG, such that srand(1) and srand(2) each had the same
period, but that the cyles for the two were not the same.


Getting further OT:

It appears to me that the traditional Unix drand48() might be
an instance of this. According to the man page:

The initializer function srand48 sets the high-order 32 bits of Xi
to the 32 bits contained in its argument. The low-order 16 bits of
Xi are set to the arbitrary value 330E16.

and

The value returned by any of the functions drand48, erand48,
lrand48, nrand48, mrand48, or jrand48 is computed by first
generating the next 48-bit Xi in the sequence. Then the
appropriate number of bits, according to the type of data item to
be returned, are copied from the high-order (leftmost) bits of Xi
and transformed into the returned value.


If we put these together, we would see that srand(1) and srand(2)
would have the same cycles -only- if it happened the the linear
congruential generator applied to 0x1330E16 happened to result
in 0x2330E16 at some point. Further analysis would be needed to
see whether that ever happened -- the default multiplier and
constant for drand48() and kin are both even, so it is not
the usual case that "every possible n-bit value will be generated,
eventually".
 
J

Jordan Abel

I don't think the standard mandates any state size, does it? I don't
think there is nothing to stop rand() being a very high quality generator.

I think he's referring to the [non-normative] example given in the
standard.
 
N

Nelu

Keith said:
I'm no expert on RNGs either, but my understanding is that piling
arbitrary stuff on top of an existing RNG is not a good approach.
It's unlikely to give you better results than just using the base RNG
directly, assuming the RNG was designed at all competently in the
first place. A given RNG may not be perfect, but arbitrary minor
tweaks to it will probably make it worse. (I think Knuth writes about
this.)

If your RNG is good enough, just use it. If it's not good enough, use
something else.


If you want to keep the sequence secret (for cryptography, for
example), *don't* use rand(). srand() and rand() are specifically
designed to produce *repeatable* sequences. (And if you want reliable
advice, ask someone who knows more about this stuff than I do.)
I agree. That's what I was meaning to say starting with 'However' and
the
1 and 2 points :).
 
M

mensanator

Walter said:
Not necessarily.

Then what does "period" mean?
Not necessarily.
Ok.


It would be possible to the initial seed to affect parameters
in the PRNG, such that srand(1) and srand(2) each had the same
period, but that the cyles for the two were not the same.

Ok, but we can't infer that from the standard, can we?
Getting further OT:

It appears to me that the traditional Unix drand48() might be
an instance of this. According to the man page:

The initializer function srand48 sets the high-order 32 bits of Xi
to the 32 bits contained in its argument. The low-order 16 bits of
Xi are set to the arbitrary value 330E16.

Isn't 330E16 24 bits?
 
K

Keith Thompson

Ben Bacarisse said:
I don't think the standard mandates any state size, does it? I don't
think there is nothing to stop rand() being a very high quality generator.

Right, but in practice, the common wisdom is that rand() is only
barely adequate, so programs that need high-quality random numbers
don't use it anyway, so there's little motivation for implementers to
improve it.

Also, the standard provides a sample implementation, and many
implementers just use that.
 
M

Micah Cowan

Then what does "period" mean?

It means that at some point, it will start over again from the
beginning.

Consider a really dumb PNRG that only ever has a cycle of four
different outputs. Perhaps srand(1) will result in the sequence:

1 6 3 12 1 6 3 12 ...

And srand(2) results in:

4 9 5 3 4 9 5 3 ...

note that they both have a period of four, but the values that are
cycled are completely different.
 
K

Keith Thompson

Micah Cowan said:
It means that at some point, it will start over again from the
beginning.

Consider a really dumb PNRG that only ever has a cycle of four
different outputs. Perhaps srand(1) will result in the sequence:

1 6 3 12 1 6 3 12 ...

And srand(2) results in:

4 9 5 3 4 9 5 3 ...

note that they both have a period of four, but the values that are
cycled are completely different.

The random number generator has at least 8 distinct states (3 bits),
which means it *could* have a cycle of 8 outputs, making full use of
its internal state. For example, srand(1) might result in:

1 6 3 12 4 9 5 3 1 6 3 12 4 9 5 3

and srand(2) might result in:

4 9 5 3 1 6 3 12 4 9 5 3 1 6 3 12

But, as you said, it's a really dumb RNG. Generally speaking, using
only a subset of the available state for a given sequence is allowed
by the standard, but *probably* not a good idea (unless the state is
large enough that a subset of it gives you a very long repeating cycle
anyway).
 
M

mensanator

Micah said:
It means that at some point, it will start over again from the
beginning.

Isn't that what I said? The issue HERE is how can a periodic
function NOT return eventually to where it started?
Consider a really dumb PNRG that only ever has a cycle of four
different outputs. Perhaps srand(1) will result in the sequence:

1 6 3 12 1 6 3 12 ...

And srand(2) results in:

4 9 5 3 4 9 5 3 ...

note that they both have a period of four, but the values that are
cycled are completely different.

That applies to the second "Not necessarily." comment,
which I did not dispute.

But can this happen in a PRNG with an n-bit state and a
2**n-1 period? Am I correct in stating that in such a case there
is only ONE sequence and srand() merely positions to a new
starting point in that one sequence? For srand() to cause
completely different sequences don't you need either more
bits in the state or else a different algorithm for each value
passed to srand()?
 
M

Micah Cowan

Keith Thompson said:
The random number generator has at least 8 distinct states (3 bits),
which means it *could* have a cycle of 8 outputs, making full use of
its internal state. For example, srand(1) might result in:

1 6 3 12 4 9 5 3 1 6 3 12 4 9 5 3

and srand(2) might result in:

4 9 5 3 1 6 3 12 4 9 5 3 1 6 3 12

Yes, but this has nothing to do with the point I was making, which was
to answer the question that was asked:
 
M

Micah Cowan

Isn't that what I said? The issue HERE is how can a periodic
function NOT return eventually to where it started?

Oh, yeah, apparently. Guess I misread again.
But can this happen in a PRNG with an n-bit state and a
2**n-1 period?

Actually, I think you mean 2**n period. Use a very small n to consider it.
Am I correct in stating that in such a case there
is only ONE sequence and srand() merely positions to a new
starting point in that one sequence?

That seems right to me.
For srand() to cause
completely different sequences don't you need either more
bits in the state or else a different algorithm for each value
passed to srand()?

The second case is essentially the same as the first, I think; but
either way I believe that's a yes.
 
W

Walter Roberson

When it's stated that
"...the Mersenne Twister as the core generator.
It produces 53-bit precision floats and has a period of 2**19937-1."
Doesn't it mean that there are 2**19937-1 states and if one were
to actually call it 2**19937 times, then one would be right back
where one started regardless of what number srand() was called with?

Period is the length of the cycle, but there can be an indefinite
number of states before it enters the cycle.

Trivial example:

static unsigned int just_seeded = 1;
static unsigned int seed = 1;

void srand(unsigned int newseed) {
just_seeded = 1 + (newseed - 1) % 31415;
seed = newseed;
}

int rand(void) {
int new_rand_num
if (just_seeded == 0) {
new_rand_num = /* regular PNRG goes here */
} else {
just_seeded--;
new_rand_num = 42;
}
return new_rand_num;
}

That is, emit 42 between 1 and 31415 times (dependant upon the seed)
before entering into the cycle. When you finally encounter 42
"naturally" you will not be back where you started, and if the period
is 2**19937-1 then after that many calls, you will not be back where
you started either. Indeed, no matter how many calls you make
to rand() without srand(), you never get back where you started
with this particular rand().
 
W

Walter Roberson

For srand() to cause
completely different sequences don't you need either more
bits in the state

More bits than what? The state is not necessarily restricted
to the number of bits in an unsigned int (i.e., the argument to srand()),
and different arguments of srand() do not necessarily affect the same
state bits in the same way.
or else a different algorithm for each value
passed to srand()?

static unsigned long m_list[] = { /* list of possible multipliers */ };
static unsigned long c_list[] = { /* list of possible constants */ };
static unsigned long current_m = /* second element of m_list */;
static unsigned long current_c = /* second element of c_list */;
static unsigned long seed = 1;

void srand( unsigned int newseed ) {
seed = newseed;
current_m = m_list[seed % (sizeof m_list / sizeof m_list[0])];
current_c = c_list[seed % (sizeof c_list / sizeof c_list[0])];
}

int rand(void) {
/* insert a LCG with coefficients current_m and current_c */
}

If the sizes of m_list and c_list were mutually prime and the
product of the sizes was greater than UINT_MAX then you would
produce a different PNRG for every possible seed, and yet it
would be exactly the same algorithm for each of them.
 
M

Micah Cowan

More bits than what? The state is not necessarily restricted
to the number of bits in an unsigned int (i.e., the argument to srand()),
and different arguments of srand() do not necessarily affect the same
state bits in the same way.

You snipped the answer to your question. More bits than n, to produce
more than one sequence, if one of the sequences has a period of 2**n.

-Micah
 
M

mensanator

Walter said:
Period is the length of the cycle, but there can be an indefinite
number of states before it enters the cycle.

Trivial example:

static unsigned int just_seeded = 1;
static unsigned int seed = 1;

void srand(unsigned int newseed) {
just_seeded = 1 + (newseed - 1) % 31415;
seed = newseed;
}

int rand(void) {
int new_rand_num
if (just_seeded == 0) {
new_rand_num = /* regular PNRG goes here */
} else {
just_seeded--;
new_rand_num = 42;
}
return new_rand_num;
}

That is, emit 42 between 1 and 31415 times (dependant upon the seed)
before entering into the cycle. When you finally encounter 42
"naturally" you will not be back where you started, and if the period
is 2**19937-1 then after that many calls, you will not be back where
you started either. Indeed, no matter how many calls you make
to rand() without srand(), you never get back where you started
with this particular rand().

Ok, I accept the "not necessarily" comments.

But if all you have to go by is the standard, the issue remains:

should you call srand() more than once to "improve" the randomness?

I've been trying to justify answering no, but if those justifications
don't necessarily hold, does the answer change to yes, or does it
remain no?

 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top