portably shuffle a deck

Keith Thompson · Aug 2, 2007

CBFalconer said:
Each reseed has, probably, one of 2^32 possible sequences to
follow. This is not enough. However, if you don't reseed then the
actual sequences can be multiplied by the possiblities inherent in
the RNG internal sequencing. This may or may not be enough (see
the Mersenne Twister), but at least you have gotten past the 2^32
possibility limit.

Maybe. It depends on the number of bits in the argument to srand()
(unsigned int, typically 32 bits, at least 16) vs. the number of bits
of internal state.

The sample implementation in the standard uses (assuming 32-bit int
and 32-bit long) a 32-bit seed and a 32-bit internal state. In that
implementation, if I'm not mistaken, there's only one sequence;
calling srand() merely picks a location along that sequence.

Of course int and long aren't necessarily 32 bits, and rand() doesn't
necessarily use the sample implementation in the standard, so that's
just one possible example.

Richard Heathfield · Aug 2, 2007

Keith Thompson said:

The sample implementation in the standard uses (assuming 32-bit int
and 32-bit long) a 32-bit seed and a 32-bit internal state. In that
implementation, if I'm not mistaken, there's only one sequence;
calling srand() merely picks a location along that sequence.

It depends. Multiple sequences are certainly a risk. For instance,
consider a rand() implementation that did this:

unsigned long _seed = 0;
void srand(unsigned int _s)
{
_seed = _s;
}
int rand(void)
{
_seed += 256;
return _seed;
}

So if you seed this with 0, you get 256, 512, 768, 1024...
If you see it with 1 instead, you get 257, 513, 769, 1025...
If you seed with 2, you get 258, 514, 770, 1026...

On implementations with CHAR_BIT = 8, all of these 256 different
sequences will wrap around to their beginnings again, without ever
picking any value from any of the others.

Clearly, this is not a high-QoI implementation! But it illustrates the
point that even a simplistic implementation can easily fall into the
trap of creating multiple sequences.

Keith Thompson · Aug 2, 2007

Richard Heathfield said:
Keith Thompson said:

The sample implementation in the standard uses (assuming 32-bit int
and 32-bit long) a 32-bit seed and a 32-bit internal state. In that
implementation, if I'm not mistaken, there's only one sequence;
calling srand() merely picks a location along that sequence.

Click to expand...

It depends. Multiple sequences are certainly a risk. For instance,
consider a rand() implementation that did this: [snip]
Clearly, this is not a high-QoI implementation! But it illustrates the
point that even a simplistic implementation can easily fall into the
trap of creating multiple sequences.

Sure, but I was specifically talking about the sample implementation
in the standard. I *think* it has a single sequence.

Army1987 · Aug 2, 2007

I have written a function to shuffle a deck of cards, as follows:
[...]

void shuffle(deck_t * deckptr)
{
static unsigned int times = 0;
int i;
srand((unsigned)clock() ^ (unsigned)time(0) + times++);

Click to expand...

Uh ... you are reseeding every shuffle? This is ok if your entropy
has high quality and you don't care about performance.

Well, I didn't think initially that there aren't more than
UINT_MAX + 1 things srand() can do. The point which CBFalconer
made downthread applies. But anyway, on a system where the number
of states of rand() is less than UINT_MAX + 1 reseeding each time
*would* make it less predictable, I think. (But I don't think such
systems are common.)

This has a bias problem.

I know, but it is negligible IMO.
deckptr->size - i is at least 630 times smaller than RAND_MAX,
and it will become even smaller as the shuffle proceeds.
A rough (over)estimate is (1 + 1/630)**52 = 1.086, so with this
bias, assuming rand() was perfect, some shuffles would be 9% more
likely than others. This pales in comparison with the fact which
too few states causes the probability of a shuffle to be either 0
or zillions of times larger than it should.

[...]

The problem is: this cannot portably shuffle a deck, because it
requires rand() to have at least 52! states. (If it has less states,
the pigeonhole principle shows that not all permutations are possible.)

Click to expand...

Actually any quantity of shuffles you make < 52! will always leave some
shuffles unavailable to you. If you actually want all 52! states to be
possible with roughly equal proportion, then you can use the Mersenne
Twister, which has far more states (it has 2**19937 states). But you
still have to come up with log(2,52!) bits worth of entropy to seed the
thing and that is not coming from clock(), time() and a counter alone.

The simplest way of creating/generating that much entropy is probably to
maintain an array a rotating offset pointer, and simply xor one of

Cannot parse that sentence. But I did consider using
(unsigned long)&some_auto_variable among the entropy sources (yes,
I think it is way insufficient by its own, but if I need 226 bits
of seed, it can help). Also a hash of *(unsigned char *)stdin,
*((unsigned char *)stdin + 1), ... *((unsigned char *)stdin +
sizeof(FILE) - 1) could help, couldn't it?

time() or clock() into that pointer's location, and maintain the state
of this array on disk (so that it is incrementally updated from run to
run.) The array would be the entropic input, that you would only need
to set once (not once per shuffle).

[...]

Well that bias will color things more than anything else really. Getting
an accurate discrete random sample is actually a bit of an exercise on
binary computers as I explain here:

http://www.pobox.com/~qed/random.html

Well, for deckptr->card[0] (which, given the way I implemented
card_t draw_card(deck_t *deckptr), is actually the *last* card
which will be drawn), in a full deck (deckptr->size ==
MAX_DECKSIZE, i.e. 52), deckptr->size - i will be 52, and RAND_MAX
cannot be less that 32767, so 1000 * RAND_MAX / range will be at
least 630000. This is potentially almost 10 times larger than
UINT_MAX, (and will become even larger as the shuffle proceeds)
so it is well possible that we can't be able to see the bias even
examining the outcome of the shuffle seeded with srand(0), the one
with srand(1) and so on through srand(UINT_MAX). (Yes, UINT_MAX is
very likely to be larger than that, but so is RAND_MAX. Anyway the
point is that "more than anything else really" sounds like a
little bit of an exaggeration to me. Using a loop to discard the
return values of rand() larger than RAND_MAX-(RAND_MAX + 1)%range
would only solve the smallest part of the problem.

Spoon · Aug 2, 2007

Eric said:
MT is often mentioned, and George Marsaglia posted a few to this
newsgroup before the insult artists drove him off.

http://groups.google.com/group/comp.lang.c/msg/bf05fb2c7c109670?dmode=source

From: Eric Sosman
Newsgroups: comp.lang.c
Date: Fri, 07 Feb 2003 09:55:13 -0500
Message-ID: <[email protected]>

Richard Heathfield · Aug 2, 2007

Keith Thompson said:

[...] I was specifically talking about the sample implementation
in the standard. I *think* it has a single sequence.

Let's take another look at it:

<quote std="c89" status="draft">
static unsigned long int next = 1;

int rand(void) /* RAND_MAX assumed to be 32767 */
{
next = next * 1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}

void srand(unsigned int seed)
{
next = seed;
}
</quote>

The stated assumption doesn't affect the sequence, of course. It merely
explains the modulo. In fact, the return value can be ignored
completely, since it does not affect whether there is a single
sequence, but only what value rand returns, which is irrelevant.

What does affect the sequence, however, is srand - it is not sufficient
to prove that the sequence starting at 1 is closed. It is necessary
(and, I believe, sufficient) to demonstrate that a single sequence
includes all possible unsigned int values.

Since UINT_MAX is not bounded by the Standard, I have to concede that
such a proof is beyond my mathematical ken (or at least, I can't see a
way right now to prove it). Induction might be the way to go - i.e. if
we can prove that there is a single sequence incorporating all the
values of unsigned int up to 65535, *and* we can prove that *if* the
sequence is closed for N bits, it must also be closed for N+1 bits,
then we have proved that there is indeed a single sequence.

Jean-Marc Bourguet · Aug 2, 2007

Richard Heathfield said:
Keith Thompson said:

[...] I was specifically talking about the sample implementation
in the standard. I *think* it has a single sequence.

Click to expand...

Let's take another look at it:

<quote std="c89" status="draft">
static unsigned long int next = 1;

int rand(void) /* RAND_MAX assumed to be 32767 */
{
next = next * 1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}

void srand(unsigned int seed)
{
next = seed;
}
</quote>

The stated assumption doesn't affect the sequence, of course. It merely
explains the modulo. In fact, the return value can be ignored
completely, since it does not affect whether there is a single
sequence, but only what value rand returns, which is irrelevant.

What does affect the sequence, however, is srand - it is not sufficient
to prove that the sequence starting at 1 is closed. It is necessary
(and, I believe, sufficient) to demonstrate that a single sequence
includes all possible unsigned int values.

Since UINT_MAX is not bounded by the Standard, I have to concede that
such a proof is beyond my mathematical ken (or at least, I can't see a
way right now to prove it). Induction might be the way to go - i.e. if
we can prove that there is a single sequence incorporating all the
values of unsigned int up to 65535, *and* we can prove that *if* the
sequence is closed for N bits, it must also be closed for N+1 bits,
then we have proved that there is indeed a single sequence.

The sequence is independant of any bit of the seed after the 31st one.

Let next = next1 * (1<<31) + next2. The next generated number is

((next * 1103515245 + 12345) /65536) % 32768
= (((next1 * (1<<31) + next2) * 1103515245 + 12345) /65536) % 32768
= ((next1*1103515245*(1<<31) + next2*1103515245+12345) / (1<<16)) % (1<<15)
= (next1*1103515245*(1<<31)/(1<<16) + (next2*1103515245+12345)/(1<<16)) %
(1<<15)
= (next1*1103515245*(1<<15) + (next2*1103515245+12345)/(1<<16)) % (1<<15)
= ((next2*1103515245+12345)/(1<<16)) % (1<<15)
and depend only of next2.

The next seed is next1*1103515245*(1<<31) + next2*1103515245 + 12345 and
the contribution of next1 is again multiplied by 1<<31 so it can't
contribute to other generated numbers.

The fact that the sequence next = (next * 1103515245 + 12345) % M generates
all numbers below M if M and 1103515245 don't share a factor is well known
(see for example in Knuth). As the standard constraint UINT_MAX+1 to be a
power of 2, it can't share a factor with 1103515245. So there is a single
sequence of length 1<<31 (as UINT_MAX+1 is guaranteed to be larger).

Yours,

Keith Thompson · Aug 3, 2007

Keith Thompson said:
Richard Heathfield said:

Keith Thompson said:

The sample implementation in the standard uses (assuming 32-bit int
and 32-bit long) a 32-bit seed and a 32-bit internal state. In that
implementation, if I'm not mistaken, there's only one sequence;
calling srand() merely picks a location along that sequence.

Click to expand...

It depends. Multiple sequences are certainly a risk. For instance,
consider a rand() implementation that did this: [snip]
Clearly, this is not a high-QoI implementation! But it illustrates the
point that even a simplistic implementation can easily fall into the
trap of creating multiple sequences.

Click to expand...

Sure, but I was specifically talking about the sample implementation
in the standard. I *think* it has a single sequence.

And I've just verified that it does, by running through all 2**32
values. I didn't confirm that it hits all 2**32 values, but it does
take exactly 2**32 iterations for the internal state to go back to 1.

(No, I didn't do it manually.)

Eric Sosman · Aug 3, 2007

Keith said:
Keith Thompson said:

Richard Heathfield said:

Keith Thompson said:
<snip>

The sample implementation in the standard uses (assuming 32-bit int
and 32-bit long) a 32-bit seed and a 32-bit internal state. In that
implementation, if I'm not mistaken, there's only one sequence;
calling srand() merely picks a location along that sequence.
It depends. Multiple sequences are certainly a risk. For instance,
consider a rand() implementation that did this: [snip]
Clearly, this is not a high-QoI implementation! But it illustrates the
point that even a simplistic implementation can easily fall into the
trap of creating multiple sequences.

Click to expand...

Sure, but I was specifically talking about the sample implementation
in the standard. I *think* it has a single sequence.

Click to expand...

And I've just verified that it does, by running through all 2**32
values. I didn't confirm that it hits all 2**32 values, but it does
take exactly 2**32 iterations for the internal state to go back to 1.

(No, I didn't do it manually.)

<off-topic>

Good grief!!!

What would you have done on a system with a 64-bit
unsigned long?

To save a considerable amount of labor next time
(and to relieve yourself of any worries about programming
misteaks), see TAOCP Theorem 3.2.1.2A. ... from which it
is obvious by inspection and with no computation at all --
well, with only as much computation as it takes to verify
that 1103515245 % 4 == 1 and 12345 % 2 == 1 -- that the
sequence is periodic with period ULONG_MAX+1. Oh, yes: it
hits all the values [0..ULONG_MAX] along the way (a fact
you could deduce from the period length alone).

Don't programmers learn *anything* about PRNGs nowadays?

</off-topic>

Richard Heathfield · Aug 3, 2007

Eric Sosman said:

Don't programmers learn *anything* about PRNGs nowadays?

No, I doesn't believe they does. There's a lot they doesn't learn about
nowadays, and PRNGs is one of them.

Eric Sosman · Aug 3, 2007

Richard said:
Eric Sosman said:

No, I doesn't believe they does. There's a lot they doesn't learn about
nowadays, and PRNGs is one of them.

"But you try and tell the young people today that...
and they won't believe ya'." -- Michael Palin

A question about program logic... program provided	20	Sep 20, 2013
self studying: this program slows down after repeated runs on my machine.	10	Sep 2, 2013
Shuffle method error	4	Jun 19, 2006
Deck of Cards -- Shuffling (void shuffle)	4	Feb 19, 2007
Needing Help With My First JavaScript script	1	Mar 22, 2018
Portably extracting data from a bytestring	17	Oct 25, 2005
How do you guys limit rand() results to a range?	133	Jun 1, 2014
Python: Code is ignoring the if and else	11	Aug 3, 2013

portably shuffle a deck

Keith Thompson

Richard Heathfield

Keith Thompson

Army1987

Spoon

Richard Heathfield

Jean-Marc Bourguet

Keith Thompson

Eric Sosman

Richard Heathfield

Eric Sosman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads