portably shuffle a deck

K

Keith Thompson

CBFalconer said:
Each reseed has, probably, one of 2^32 possible sequences to
follow. This is not enough. However, if you don't reseed then the
actual sequences can be multiplied by the possiblities inherent in
the RNG internal sequencing. This may or may not be enough (see
the Mersenne Twister), but at least you have gotten past the 2^32
possibility limit.

Maybe. It depends on the number of bits in the argument to srand()
(unsigned int, typically 32 bits, at least 16) vs. the number of bits
of internal state.

The sample implementation in the standard uses (assuming 32-bit int
and 32-bit long) a 32-bit seed and a 32-bit internal state. In that
implementation, if I'm not mistaken, there's only one sequence;
calling srand() merely picks a location along that sequence.

Of course int and long aren't necessarily 32 bits, and rand() doesn't
necessarily use the sample implementation in the standard, so that's
just one possible example.
 
R

Richard Heathfield

Keith Thompson said:

The sample implementation in the standard uses (assuming 32-bit int
and 32-bit long) a 32-bit seed and a 32-bit internal state. In that
implementation, if I'm not mistaken, there's only one sequence;
calling srand() merely picks a location along that sequence.

It depends. Multiple sequences are certainly a risk. For instance,
consider a rand() implementation that did this:

unsigned long _seed = 0;
void srand(unsigned int _s)
{
_seed = _s;
}
int rand(void)
{
_seed += 256;
return _seed;
}

So if you seed this with 0, you get 256, 512, 768, 1024...
If you see it with 1 instead, you get 257, 513, 769, 1025...
If you seed with 2, you get 258, 514, 770, 1026...

On implementations with CHAR_BIT = 8, all of these 256 different
sequences will wrap around to their beginnings again, without ever
picking any value from any of the others.

Clearly, this is not a high-QoI implementation! But it illustrates the
point that even a simplistic implementation can easily fall into the
trap of creating multiple sequences.
 
K

Keith Thompson

Richard Heathfield said:
Keith Thompson said:

The sample implementation in the standard uses (assuming 32-bit int
and 32-bit long) a 32-bit seed and a 32-bit internal state. In that
implementation, if I'm not mistaken, there's only one sequence;
calling srand() merely picks a location along that sequence.

It depends. Multiple sequences are certainly a risk. For instance,
consider a rand() implementation that did this: [snip]
Clearly, this is not a high-QoI implementation! But it illustrates the
point that even a simplistic implementation can easily fall into the
trap of creating multiple sequences.

Sure, but I was specifically talking about the sample implementation
in the standard. I *think* it has a single sequence.
 
A

Army1987

I have written a function to shuffle a deck of cards, as follows:
[...]

void shuffle(deck_t * deckptr)
{
static unsigned int times = 0;
int i;
srand((unsigned)clock() ^ (unsigned)time(0) + times++);

Uh ... you are reseeding every shuffle? This is ok if your entropy
has high quality and you don't care about performance.
Well, I didn't think initially that there aren't more than
UINT_MAX + 1 things srand() can do. The point which CBFalconer
made downthread applies. But anyway, on a system where the number
of states of rand() is less than UINT_MAX + 1 reseeding each time
*would* make it less predictable, I think. (But I don't think such
systems are common.)
This has a bias problem.
I know, but it is negligible IMO.
deckptr->size - i is at least 630 times smaller than RAND_MAX,
and it will become even smaller as the shuffle proceeds.
A rough (over)estimate is (1 + 1/630)**52 = 1.086, so with this
bias, assuming rand() was perfect, some shuffles would be 9% more
likely than others. This pales in comparison with the fact which
too few states causes the probability of a shuffle to be either 0
or zillions of times larger than it should.
[...]
The problem is: this cannot portably shuffle a deck, because it
requires rand() to have at least 52! states. (If it has less states,
the pigeonhole principle shows that not all permutations are possible.)

Actually any quantity of shuffles you make < 52! will always leave some
shuffles unavailable to you. If you actually want all 52! states to be
possible with roughly equal proportion, then you can use the Mersenne
Twister, which has far more states (it has 2**19937 states). But you
still have to come up with log(2,52!) bits worth of entropy to seed the
thing and that is not coming from clock(), time() and a counter alone.

The simplest way of creating/generating that much entropy is probably to
maintain an array a rotating offset pointer, and simply xor one of
Cannot parse that sentence. But I did consider using
(unsigned long)&some_auto_variable among the entropy sources (yes,
I think it is way insufficient by its own, but if I need 226 bits
of seed, it can help). Also a hash of *(unsigned char *)stdin,
*((unsigned char *)stdin + 1), ... *((unsigned char *)stdin +
sizeof(FILE) - 1) could help, couldn't it?
time() or clock() into that pointer's location, and maintain the state
of this array on disk (so that it is incrementally updated from run to
run.) The array would be the entropic input, that you would only need
to set once (not once per shuffle).

[...]
Well that bias will color things more than anything else really. Getting
an accurate discrete random sample is actually a bit of an exercise on
binary computers as I explain here:

http://www.pobox.com/~qed/random.html
Well, for deckptr->card[0] (which, given the way I implemented
card_t draw_card(deck_t *deckptr), is actually the *last* card
which will be drawn), in a full deck (deckptr->size ==
MAX_DECKSIZE, i.e. 52), deckptr->size - i will be 52, and RAND_MAX
cannot be less that 32767, so 1000 * RAND_MAX / range will be at
least 630000. This is potentially almost 10 times larger than
UINT_MAX, (and will become even larger as the shuffle proceeds)
so it is well possible that we can't be able to see the bias even
examining the outcome of the shuffle seeded with srand(0), the one
with srand(1) and so on through srand(UINT_MAX). (Yes, UINT_MAX is
very likely to be larger than that, but so is RAND_MAX. Anyway the
point is that "more than anything else really" sounds like a
little bit of an exaggeration to me. Using a loop to discard the
return values of rand() larger than RAND_MAX-(RAND_MAX + 1)%range
would only solve the smallest part of the problem.
 
R

Richard Heathfield

Keith Thompson said:
[...] I was specifically talking about the sample implementation
in the standard. I *think* it has a single sequence.

Let's take another look at it:

<quote std="c89" status="draft">
static unsigned long int next = 1;

int rand(void) /* RAND_MAX assumed to be 32767 */
{
next = next * 1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}

void srand(unsigned int seed)
{
next = seed;
}
</quote>

The stated assumption doesn't affect the sequence, of course. It merely
explains the modulo. In fact, the return value can be ignored
completely, since it does not affect whether there is a single
sequence, but only what value rand returns, which is irrelevant.

What does affect the sequence, however, is srand - it is not sufficient
to prove that the sequence starting at 1 is closed. It is necessary
(and, I believe, sufficient) to demonstrate that a single sequence
includes all possible unsigned int values.

Since UINT_MAX is not bounded by the Standard, I have to concede that
such a proof is beyond my mathematical ken (or at least, I can't see a
way right now to prove it). Induction might be the way to go - i.e. if
we can prove that there is a single sequence incorporating all the
values of unsigned int up to 65535, *and* we can prove that *if* the
sequence is closed for N bits, it must also be closed for N+1 bits,
then we have proved that there is indeed a single sequence.
 
J

Jean-Marc Bourguet

Richard Heathfield said:
Keith Thompson said:
[...] I was specifically talking about the sample implementation
in the standard. I *think* it has a single sequence.

Let's take another look at it:

<quote std="c89" status="draft">
static unsigned long int next = 1;

int rand(void) /* RAND_MAX assumed to be 32767 */
{
next = next * 1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}

void srand(unsigned int seed)
{
next = seed;
}
</quote>

The stated assumption doesn't affect the sequence, of course. It merely
explains the modulo. In fact, the return value can be ignored
completely, since it does not affect whether there is a single
sequence, but only what value rand returns, which is irrelevant.

What does affect the sequence, however, is srand - it is not sufficient
to prove that the sequence starting at 1 is closed. It is necessary
(and, I believe, sufficient) to demonstrate that a single sequence
includes all possible unsigned int values.

Since UINT_MAX is not bounded by the Standard, I have to concede that
such a proof is beyond my mathematical ken (or at least, I can't see a
way right now to prove it). Induction might be the way to go - i.e. if
we can prove that there is a single sequence incorporating all the
values of unsigned int up to 65535, *and* we can prove that *if* the
sequence is closed for N bits, it must also be closed for N+1 bits,
then we have proved that there is indeed a single sequence.

The sequence is independant of any bit of the seed after the 31st one.

Let next = next1 * (1<<31) + next2. The next generated number is

((next * 1103515245 + 12345) /65536) % 32768
= (((next1 * (1<<31) + next2) * 1103515245 + 12345) /65536) % 32768
= ((next1*1103515245*(1<<31) + next2*1103515245+12345) / (1<<16)) % (1<<15)
= (next1*1103515245*(1<<31)/(1<<16) + (next2*1103515245+12345)/(1<<16)) %
(1<<15)
= (next1*1103515245*(1<<15) + (next2*1103515245+12345)/(1<<16)) % (1<<15)
= ((next2*1103515245+12345)/(1<<16)) % (1<<15)
and depend only of next2.

The next seed is next1*1103515245*(1<<31) + next2*1103515245 + 12345 and
the contribution of next1 is again multiplied by 1<<31 so it can't
contribute to other generated numbers.

The fact that the sequence next = (next * 1103515245 + 12345) % M generates
all numbers below M if M and 1103515245 don't share a factor is well known
(see for example in Knuth). As the standard constraint UINT_MAX+1 to be a
power of 2, it can't share a factor with 1103515245. So there is a single
sequence of length 1<<31 (as UINT_MAX+1 is guaranteed to be larger).

Yours,
 
K

Keith Thompson

Keith Thompson said:
Richard Heathfield said:
Keith Thompson said:
The sample implementation in the standard uses (assuming 32-bit int
and 32-bit long) a 32-bit seed and a 32-bit internal state. In that
implementation, if I'm not mistaken, there's only one sequence;
calling srand() merely picks a location along that sequence.

It depends. Multiple sequences are certainly a risk. For instance,
consider a rand() implementation that did this: [snip]
Clearly, this is not a high-QoI implementation! But it illustrates the
point that even a simplistic implementation can easily fall into the
trap of creating multiple sequences.

Sure, but I was specifically talking about the sample implementation
in the standard. I *think* it has a single sequence.

And I've just verified that it does, by running through all 2**32
values. I didn't confirm that it hits all 2**32 values, but it does
take exactly 2**32 iterations for the internal state to go back to 1.

(No, I didn't do it manually.)
 
E

Eric Sosman

Keith said:
Keith Thompson said:
Richard Heathfield said:
Keith Thompson said:
<snip>

The sample implementation in the standard uses (assuming 32-bit int
and 32-bit long) a 32-bit seed and a 32-bit internal state. In that
implementation, if I'm not mistaken, there's only one sequence;
calling srand() merely picks a location along that sequence.
It depends. Multiple sequences are certainly a risk. For instance,
consider a rand() implementation that did this: [snip]
Clearly, this is not a high-QoI implementation! But it illustrates the
point that even a simplistic implementation can easily fall into the
trap of creating multiple sequences.
Sure, but I was specifically talking about the sample implementation
in the standard. I *think* it has a single sequence.

And I've just verified that it does, by running through all 2**32
values. I didn't confirm that it hits all 2**32 values, but it does
take exactly 2**32 iterations for the internal state to go back to 1.

(No, I didn't do it manually.)

<off-topic>

Good grief!!!

What would you have done on a system with a 64-bit
unsigned long?

To save a considerable amount of labor next time
(and to relieve yourself of any worries about programming
misteaks), see TAOCP Theorem 3.2.1.2A. ... from which it
is obvious by inspection and with no computation at all --
well, with only as much computation as it takes to verify
that 1103515245 % 4 == 1 and 12345 % 2 == 1 -- that the
sequence is periodic with period ULONG_MAX+1. Oh, yes: it
hits all the values [0..ULONG_MAX] along the way (a fact
you could deduce from the period length alone).

Don't programmers learn *anything* about PRNGs nowadays?

</off-topic>
 
R

Richard Heathfield

Eric Sosman said:

Don't programmers learn *anything* about PRNGs nowadays?

No, I doesn't believe they does. There's a lot they doesn't learn about
nowadays, and PRNGs is one of them.
 
E

Eric Sosman

Richard said:
Eric Sosman said:



No, I doesn't believe they does. There's a lot they doesn't learn about
nowadays, and PRNGs is one of them.

"But you try and tell the young people today that...
and they won't believe ya'." -- Michael Palin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,124
Latest member
JuniorPell
Top