python & mathematical methods of picking numbers at random

Discussion in 'Python' started by Bart Nessux, Jan 15, 2004.

  1. Bart Nessux

    Bart Nessux Guest

    I am using method 'a' below to pick 25 names from a pool of 225. A
    co-worker is using method 'b' by running it 25 times and throwing out
    the winning name (names are associated with numbers) after each run and
    then re-counting the list and doing it all over again.

    My boss thinks that 'b' is somehow less fair than 'a', but the only
    thing I see wrong with it is that it is really inefficient and ugly as
    it's doing manually what 'a' does automatically, other than that I think
    the outcome of both methods (25 unique winners from a pool of 225) are
    the same. Anyone disagree with that and if so, please demonstrate how
    'b' isn't as fair as 'a'

    count = len(list_of_potential_winners)

    a = random.sample(range(count), 25)

    b = random.sample(range(count), 1)

    Thanks!
    Bart
     
    Bart Nessux, Jan 15, 2004
    #1
    1. Advertising

  2. Bart Nessux

    Jeff Epler Guest

    It sounds like your co-worker has re-written sample. random.sample(l, 1)
    is the same as random.choice(l), so that's another source of inefficiency.

    But why are *you* using
    random.sample(range(len(x)), 25)
    instead of
    random.sample(x, 25)
    ?

    Jeff
     
    Jeff Epler, Jan 15, 2004
    #2
    1. Advertising

  3. Bart Nessux

    Bart Nessux Guest

    Jeff Epler wrote:
    > It sounds like your co-worker has re-written sample. random.sample(l, 1)
    > is the same as random.choice(l), so that's another source of inefficiency.
    >
    > But why are *you* using
    > random.sample(range(len(x)), 25)
    > instead of
    > random.sample(x, 25)
    > ?
    >
    > Jeff
    >


    Because it works and it's fast and len(count) changes every drawing.
     
    Bart Nessux, Jan 15, 2004
    #3
  4. Bart Nessux

    Paul Rubin Guest

    Bart Nessux <> writes:
    > I am using method 'a' below to pick 25 names from a pool of 225. A
    > co-worker is using method 'b' by running it 25 times and throwing out
    > the winning name (names are associated with numbers) after each run
    > and then re-counting the list and doing it all over again.
    >
    > My boss thinks that 'b' is somehow less fair than 'a',


    Both are the same, as you can see by calculating the probability of
    any given name being selected. What is the application, and the
    computer environment? You may also need to worry about correlations
    in the underlying Mersenne Twister PRNG. If the application is
    something where randomness is very important (you're picking winners
    for a big lottery or something) then you should use a better RNG.
     
    Paul Rubin, Jan 15, 2004
    #4
  5. Bart Nessux

    Bart Nessux Guest

    Paul Rubin wrote:
    > Bart Nessux <> writes:
    >
    >>I am using method 'a' below to pick 25 names from a pool of 225. A
    >>co-worker is using method 'b' by running it 25 times and throwing out
    >>the winning name (names are associated with numbers) after each run
    >>and then re-counting the list and doing it all over again.
    >>
    >>My boss thinks that 'b' is somehow less fair than 'a',

    >
    >
    > Both are the same, as you can see by calculating the probability of
    > any given name being selected. What is the application, and the
    > computer environment? You may also need to worry about correlations
    > in the underlying Mersenne Twister PRNG. If the application is
    > something where randomness is very important (you're picking winners
    > for a big lottery or something) then you should use a better RNG.


    We're raffling off crock-pots... that's why I think this is OK for our
    purposes.
     
    Bart Nessux, Jan 15, 2004
    #5
  6. Bart Nessux

    Terry Reedy Guest

    "Bart Nessux" <> wrote in message
    news:bu6rvn$njg$...
    > Jeff Epler wrote:
    > > But why are *you* using
    > > random.sample(range(len(x)), 25)
    > > instead of
    > > random.sample(x, 25)
    > > ?


    > Because it works and it's fast and len(count) changes every drawing.


    I think you missed Jeff's point, which is that you are repeating part of
    the work that sample tries to do for you. From the Lib Ref:
    "
    sample(sequence, k): Return a k length list of unique elements chosen from
    the population sequence. Used for random sampling without replacement. New
    in version 2.3.

    Returns a new list containing elements from the population while leaving
    the original population unchanged. The resulting list is in selection order
    so that all sub-slices will also be valid random samples. This allows
    raffle winners (the sample) to be partitioned into grand prize and second
    place winners (the subslices).
    "
    When you get the sample from range(n), you have to use them as indexes into
    x to get the actual list of names. But the indexing and extraction is what
    sample would do if you gave it x instead of range(x)!

    Terry J. Reedy
     
    Terry Reedy, Jan 16, 2004
    #6
  7. Bart Nessux wrote:
    > Paul Rubin wrote:
    >
    >> Bart Nessux <> writes:
    >>
    >>> I am using method 'a' below to pick 25 names from a pool of 225. A
    >>> co-worker is using method 'b' by running it 25 times and throwing out
    >>> the winning name (names are associated with numbers) after each run
    >>> and then re-counting the list and doing it all over again.
    >>>
    >>> My boss thinks that 'b' is somehow less fair than 'a',

    >>
    >>
    >>
    >> Both are the same, as you can see by calculating the probability of
    >> any given name being selected. What is the application, and the
    >> computer environment? You may also need to worry about correlations
    >> in the underlying Mersenne Twister PRNG. If the application is
    >> something where randomness is very important (you're picking winners
    >> for a big lottery or something) then you should use a better RNG.

    >
    >
    > We're raffling off crock-pots... that's why I think this is OK for our
    > purposes.
    >


    Some will claim you cooked the numbers, even if it is a crock.
    Let 'em blow off some steam, but don't chicken out. If you let them stew
    for a day, they'll soften up and you'll eventually reach a cord.
     
    Mark Borgerding, Jan 16, 2004
    #7
  8. | Bart Nessux said |

    > I am using method 'a' below to pick 25 names from a pool of 225. A
    > co-worker is using method 'b' by running it 25 times and throwing out the
    > winning name (names are associated with numbers) after each run and then
    > re-counting the list and doing it all over again.
    >
    > My boss thinks that 'b' is somehow less fair than 'a', but the only thing
    > I see wrong with it is that it is really inefficient and ugly as it's
    > doing manually what 'a' does automatically, other than that I think the
    > outcome of both methods (25 unique winners from a pool of 225) are the
    > same. Anyone disagree with that and if so, please demonstrate how 'b'
    > isn't as fair as 'a'
    >
    > count = len(list_of_potential_winners)
    >
    > a = random.sample(range(count), 25)
    >
    > b = random.sample(range(count), 1)
    >
    > Thanks!
    > Bart


    I looked at the code for random.sample, and found out that the two methods
    are probabilistically equivalent. Neither is more or less fair than the
    other.

    You can, however, poke fun at your cow-orker for using
    random.sample(range(count, 1) when random.randint(1,count) would have done
    the exact same thing with the way he used random.sample.

    HTH

    Sam Walters.

    P.S. The code for sample in random.py is very simple and fairly
    straightforward. You should take a peek at it. The basic algorithm is to
    make a list of winners, choose a random number, then if the winner is not
    already in the list, add them. If the winner is already in the list,
    retry until a new winner comes up. Repeat until you have the desired
    number of winners.

    --
    Never forget the halloween documents.
    http://www.opensource.org/halloween/
    """ Where will Microsoft try to drag you today?
    Do you really want to go there?"""
     
    Samuel Walters, Jan 16, 2004
    #8
  9. Bart Nessux

    Bart Nessux Guest

    Terry Reedy wrote:
    > "Bart Nessux" <> wrote in message
    > news:bu6rvn$njg$...
    >
    >>Jeff Epler wrote:
    >>
    >>>But why are *you* using
    >>> random.sample(range(len(x)), 25)
    >>>instead of
    >>> random.sample(x, 25)
    >>>?

    >
    >
    >>Because it works and it's fast and len(count) changes every drawing.

    >
    >
    > I think you missed Jeff's point, which is that you are repeating part of
    > the work that sample tries to do for you. From the Lib Ref:
    > "
    > sample(sequence, k): Return a k length list of unique elements chosen from
    > the population sequence. Used for random sampling without replacement. New
    > in version 2.3.
    >
    > Returns a new list containing elements from the population while leaving
    > the original population unchanged. The resulting list is in selection order
    > so that all sub-slices will also be valid random samples. This allows
    > raffle winners (the sample) to be partitioned into grand prize and second
    > place winners (the subslices).
    > "
    > When you get the sample from range(n), you have to use them as indexes into
    > x to get the actual list of names. But the indexing and extraction is what
    > sample would do if you gave it x instead of range(x)!


    Ahh, I see what you mean.
     
    Bart Nessux, Jan 16, 2004
    #9
  10. Bart Nessux

    Bart Nessux Guest

    Terry Reedy wrote:
    > "Bart Nessux" <> wrote in message
    > news:bu6rvn$njg$...
    >
    >>Jeff Epler wrote:
    >>
    >>>But why are *you* using
    >>> random.sample(range(len(x)), 25)
    >>>instead of
    >>> random.sample(x, 25)
    >>>?

    >
    >
    >>Because it works and it's fast and len(count) changes every drawing.

    >
    >
    > I think you missed Jeff's point, which is that you are repeating part of
    > the work that sample tries to do for you. From the Lib Ref:
    > "
    > sample(sequence, k): Return a k length list of unique elements chosen from
    > the population sequence. Used for random sampling without replacement. New
    > in version 2.3.
    >
    > Returns a new list containing elements from the population while leaving
    > the original population unchanged. The resulting list is in selection order
    > so that all sub-slices will also be valid random samples. This allows
    > raffle winners (the sample) to be partitioned into grand prize and second
    > place winners (the subslices).
    > "
    > When you get the sample from range(n), you have to use them as indexes into
    > x to get the actual list of names. But the indexing and extraction is what
    > sample would do if you gave it x instead of range(x)!
    >
    > Terry J. Reedy
    >
    >


    Also, the below statement should be removed from random's
    documentation... it's where I got the idea to do:

    random.sample(range(len(x)), 25)
    instead of
    random.sample(x, 25)

    "To choose a sample from a range of integers, use xrange
    as an argument. This is especially fast and space efficient
    for sampling from a large population: sample(xrange(10000000), 60)."

    http://www.python.org/doc/current/lib/module-random.html
     
    Bart Nessux, Jan 16, 2004
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. globalrev
    Replies:
    4
    Views:
    817
    Gabriel Genellina
    Apr 20, 2008
  2. blaine
    Replies:
    2
    Views:
    253
    Carl Banks
    May 29, 2008
  3. joseph cook
    Replies:
    1
    Views:
    936
    Steve
    Nov 28, 2009
  4. Steve
    Replies:
    0
    Views:
    291
    Steve
    Nov 28, 2009
  5. Don Bruder
    Replies:
    3
    Views:
    1,018
    spikeysnack
    Aug 3, 2010
Loading...

Share This Page