# python & mathematical methods of picking numbers at random

Discussion in 'Python' started by Bart Nessux, Jan 15, 2004.

1. ### Bart NessuxGuest

I am using method 'a' below to pick 25 names from a pool of 225. A
co-worker is using method 'b' by running it 25 times and throwing out
the winning name (names are associated with numbers) after each run and
then re-counting the list and doing it all over again.

My boss thinks that 'b' is somehow less fair than 'a', but the only
thing I see wrong with it is that it is really inefficient and ugly as
it's doing manually what 'a' does automatically, other than that I think
the outcome of both methods (25 unique winners from a pool of 225) are
the same. Anyone disagree with that and if so, please demonstrate how
'b' isn't as fair as 'a'

count = len(list_of_potential_winners)

a = random.sample(range(count), 25)

b = random.sample(range(count), 1)

Thanks!
Bart

Bart Nessux, Jan 15, 2004

2. ### Jeff EplerGuest

It sounds like your co-worker has re-written sample. random.sample(l, 1)
is the same as random.choice(l), so that's another source of inefficiency.

But why are *you* using
random.sample(range(len(x)), 25)
random.sample(x, 25)
?

Jeff

Jeff Epler, Jan 15, 2004

3. ### Bart NessuxGuest

Because it works and it's fast and len(count) changes every drawing.

Bart Nessux, Jan 15, 2004
4. ### Paul RubinGuest

Both are the same, as you can see by calculating the probability of
any given name being selected. What is the application, and the
computer environment? You may also need to worry about correlations
in the underlying Mersenne Twister PRNG. If the application is
something where randomness is very important (you're picking winners
for a big lottery or something) then you should use a better RNG.

Paul Rubin, Jan 15, 2004
5. ### Bart NessuxGuest

We're raffling off crock-pots... that's why I think this is OK for our
purposes.

Bart Nessux, Jan 15, 2004
6. ### Terry ReedyGuest

I think you missed Jeff's point, which is that you are repeating part of
the work that sample tries to do for you. From the Lib Ref:
"
sample(sequence, k): Return a k length list of unique elements chosen from
the population sequence. Used for random sampling without replacement. New
in version 2.3.

Returns a new list containing elements from the population while leaving
the original population unchanged. The resulting list is in selection order
so that all sub-slices will also be valid random samples. This allows
raffle winners (the sample) to be partitioned into grand prize and second
place winners (the subslices).
"
When you get the sample from range(n), you have to use them as indexes into
x to get the actual list of names. But the indexing and extraction is what
sample would do if you gave it x instead of range(x)!

Terry J. Reedy

Terry Reedy, Jan 16, 2004
7. ### Mark BorgerdingGuest

Some will claim you cooked the numbers, even if it is a crock.
Let 'em blow off some steam, but don't chicken out. If you let them stew
for a day, they'll soften up and you'll eventually reach a cord.

Mark Borgerding, Jan 16, 2004
8. ### Samuel WaltersGuest

| Bart Nessux said |
I looked at the code for random.sample, and found out that the two methods
are probabilistically equivalent. Neither is more or less fair than the
other.

You can, however, poke fun at your cow-orker for using
random.sample(range(count, 1) when random.randint(1,count) would have done
the exact same thing with the way he used random.sample.

HTH

Sam Walters.

P.S. The code for sample in random.py is very simple and fairly
straightforward. You should take a peek at it. The basic algorithm is to
make a list of winners, choose a random number, then if the winner is not
retry until a new winner comes up. Repeat until you have the desired
number of winners.

Samuel Walters, Jan 16, 2004
9. ### Bart NessuxGuest

Ahh, I see what you mean.

Bart Nessux, Jan 16, 2004
10. ### Bart NessuxGuest

Also, the below statement should be removed from random's
documentation... it's where I got the idea to do:

random.sample(range(len(x)), 25)
random.sample(x, 25)

"To choose a sample from a range of integers, use xrange
as an argument. This is especially fast and space efficient
for sampling from a large population: sample(xrange(10000000), 60)."

http://www.python.org/doc/current/lib/module-random.html

Bart Nessux, Jan 16, 2004