Re: Random Drawing Simulation -- performance issue

Discussion in 'Python' started by Travis E. Oliphant, Sep 13, 2006.

  1. Brendon Towle wrote:
    > I need to simulate scenarios like the following: "You have a deck of
    > 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    > replace it, and repeat N times."
    >


    Thinking about the problem as drawing sample froms a discrete
    distribution defined by the population might help.

    For example, in SciPy you can define your own discrete random variable:

    var = scipy.stats.rv_discrete(name='sample',
    values=([0,1,2],[3/10.,5/10.,2/10.]))

    Then

    var.rvs(size=10000)

    would quickly return an array of "draws"

    If you didn't want to install SciPy, but were willing to install NumPy,
    then the crux of the algorithm is to generate an entire array of uniform
    random variables: numpy.random.rand(count) and then map them through
    the inverse cumulative distribution function to generate your samples.
    The inverse cumulative distribution function is just a series of steps
    whose width depends on the probablity distribution.

    Thus, the population defines the draw boundaries. To make it easy, just
    multiply the uniform random number by the total number of cards and then
    the boundaries are on the integers of the number of each kind of card.

    Here is an implementation. I find this version to be 2x - 5x faster
    depending on how many draws are used.


    import numpy as N

    def randomDrawing_N(count, population):
    probs = [x[0] for x in population]
    res = [[0, item[1]] for item in population]
    nums = N.random.rand(count)
    cprobs = N.cumsum([0]+probs)
    nums *= cprobs[-1]
    for k, val in enumerate(probs):
    res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
    return res


    -Travis Oliphant
    Travis E. Oliphant, Sep 13, 2006
    #1
    1. Advertising

  2. Travis E. Oliphant

    Paul Rubin Guest

    "Travis E. Oliphant" <> writes:
    > > I need to simulate scenarios like the following: "You have a deck of
    > > 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    > > replace it, and repeat N times."
    > >

    > Thinking about the problem as drawing sample froms a discrete
    > distribution defined by the population might help.


    Is there some important reason you want to do this as a simulation?
    And is the real problem more complicated? If you draw from the
    distribution 100,000 times with replacement and sum the results, per
    the Central Limit Theorem you'll get something very close to a normal
    distribution whose parameters you can determine analytically. There
    is probably also some statistics formula to find the precise error.
    So you can replace the 100,000 draws with a single draw.
    Paul Rubin, Sep 13, 2006
    #2
    1. Advertising

  3. Travis E. Oliphant

    Robert Kern Guest

    Paul Rubin wrote:
    > "Travis E. Oliphant" <> writes:
    >>> I need to simulate scenarios like the following: "You have a deck of
    >>> 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    >>> replace it, and repeat N times."
    >>>

    >> Thinking about the problem as drawing sample froms a discrete
    >> distribution defined by the population might help.

    >
    > Is there some important reason you want to do this as a simulation?
    > And is the real problem more complicated? If you draw from the
    > distribution 100,000 times with replacement and sum the results, per
    > the Central Limit Theorem you'll get something very close to a normal
    > distribution whose parameters you can determine analytically. There
    > is probably also some statistics formula to find the precise error.
    > So you can replace the 100,000 draws with a single draw.


    Along the lines of what you're trying to get at, the problem that the OP is
    describing is one of sampling from a multinomial distribution.

    http://en.wikipedia.org/wiki/Multinomial_distribution

    numpy has a function that will do the sampling for you:


    In [4]: numpy.random.multinomial?
    Type: builtin_function_or_method
    Base Class: <type 'builtin_function_or_method'>
    String Form: <built-in method multinomial of mtrand.RandomState object at
    0x3e140>
    Namespace: Interactive
    Docstring:
    Multinomial distribution.

    multinomial(n, pvals, size=None) -> random values

    pvals is a sequence of probabilities that should sum to 1 (however, the
    last element is always assumed to account for the remaining probability
    as long as sum(pvals[:-1]) <= 1).


    Sampling from the multinomial distribution is quite simply implemented given a
    binomial sampler. Unfortunately, the standard library's random module does not
    have one. If the number of samples is high enough, then one might be able to
    approximate the binomial distribution with a normal one, but you'd be better off
    just installing numpy.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, Sep 13, 2006
    #3
  4. Travis E. Oliphant wrote:
    > Brendon Towle wrote:
    >> I need to simulate scenarios like the following: "You have a deck of
    >> 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    >> replace it, and repeat N times."
    >>

    >
    > Thinking about the problem as drawing sample froms a discrete
    > distribution defined by the population might help.
    >
    > For example, in SciPy you can define your own discrete random variable:
    >
    > var = scipy.stats.rv_discrete(name='sample',
    > values=([0,1,2],[3/10.,5/10.,2/10.]))
    >
    > Then
    >
    > var.rvs(size=10000)
    >
    > would quickly return an array of "draws"
    >
    > If you didn't want to install SciPy, but were willing to install NumPy,
    > then the crux of the algorithm is to generate an entire array of uniform
    > random variables: numpy.random.rand(count) and then map them through
    > the inverse cumulative distribution function to generate your samples.
    > The inverse cumulative distribution function is just a series of steps
    > whose width depends on the probablity distribution.
    >
    > Thus, the population defines the draw boundaries. To make it easy, just
    > multiply the uniform random number by the total number of cards and then
    > the boundaries are on the integers of the number of each kind of card.
    >
    > Here is an implementation. I find this version to be 2x - 5x faster
    > depending on how many draws are used.
    >
    >
    > import numpy as N
    >
    > def randomDrawing_N(count, population):
    > probs = [x[0] for x in population]
    > res = [[0, item[1]] for item in population]
    > nums = N.random.rand(count)
    > cprobs = N.cumsum([0]+probs)
    > nums *= cprobs[-1]
    > for k, val in enumerate(probs):
    > res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
    > return res
    >
    >
    > -Travis Oliphant
    >


    In response to what Travis and Simon wrote -
    (1) Where the heck can I find a description of scipy's stat functions?
    Documentation on these seems sparse.

    (2) How does one set up a good timer for Python as implemented for
    Windows? (i.e., how can I make API calls to Windows from Python?)

    Please bear with me, or not; I am /just/ starting off with Python.

    Thanks,
    Dave Braden
    David J. Braden, Sep 13, 2006
    #4
  5. David J. Braden wrote:
    > Travis E. Oliphant wrote:
    >> Brendon Towle wrote:
    >>> I need to simulate scenarios like the following: "You have a deck of
    >>> 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
    >>> replace it, and repeat N times."
    >>>

    >>
    >> Thinking about the problem as drawing sample froms a discrete
    >> distribution defined by the population might help.
    >>
    >> For example, in SciPy you can define your own discrete random variable:
    >>
    >> var = scipy.stats.rv_discrete(name='sample',
    >> values=([0,1,2],[3/10.,5/10.,2/10.]))
    >>
    >> Then
    >>
    >> var.rvs(size=10000)
    >>
    >> would quickly return an array of "draws"
    >>
    >> If you didn't want to install SciPy, but were willing to install
    >> NumPy, then the crux of the algorithm is to generate an entire array
    >> of uniform random variables: numpy.random.rand(count) and then map
    >> them through the inverse cumulative distribution function to generate
    >> your samples. The inverse cumulative distribution function is just a
    >> series of steps whose width depends on the probablity distribution.
    >>
    >> Thus, the population defines the draw boundaries. To make it easy,
    >> just multiply the uniform random number by the total number of cards
    >> and then the boundaries are on the integers of the number of each kind
    >> of card.
    >>
    >> Here is an implementation. I find this version to be 2x - 5x faster
    >> depending on how many draws are used.
    >>
    >>
    >> import numpy as N
    >>
    >> def randomDrawing_N(count, population):
    >> probs = [x[0] for x in population]
    >> res = [[0, item[1]] for item in population]
    >> nums = N.random.rand(count)
    >> cprobs = N.cumsum([0]+probs)
    >> nums *= cprobs[-1]
    >> for k, val in enumerate(probs):
    >> res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
    >> return res
    >>
    >>
    >> -Travis Oliphant
    >>

    >
    > In response to what Travis and Simon wrote -
    > (1) Where the heck can I find a description of scipy's stat functions?
    > Documentation on these seems sparse.
    >
    > (2) How does one set up a good timer for Python as implemented for
    > Windows? (i.e., how can I make API calls to Windows from Python?)
    >
    > Please bear with me, or not; I am /just/ starting off with Python.
    >
    > Thanks,
    > Dave Braden


    Oops, or rather, duh.
    Rereading Robert's post clarified for me where multinomial is, and that
    to get help I need to specify more than simply help(multinomial).
    Sheesh, I'm dim today.

    My question re timing code stands.

    TIA,
    DaveB
    David J. Braden, Sep 13, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    619
    Kevin Spencer
    Jan 9, 2006
  2. Brendon Towle
    Replies:
    3
    Views:
    423
    Simon Forman
    Sep 12, 2006
  3. globalrev
    Replies:
    4
    Views:
    742
    Gabriel Genellina
    Apr 20, 2008
  4. defn noob
    Replies:
    1
    Views:
    370
    Mark Space
    Jun 28, 2008
  5. VK
    Replies:
    15
    Views:
    1,119
    Dr J R Stockton
    May 2, 2010
Loading...

Share This Page