Distribution

prince.pangeni · Mar 20, 2012

Hi all,
I am doing a simulation project using Python. In my project, I want
to use some short of distribution to generate requests to a server.
The request should have two distributions. One for request arrival
rate (should be poisson) and another for request mix (i.e. out of the
total requests defined in request arrival rate, how many requests are
of which type).
Example: Suppose the request rate is - 90 req/sec (generated using
poisson distribution) at time t and we have 3 types of requests (i.e.
r1, r2, r2). The request mix distribution output should be similar to:
{r1 : 50 , r2 : 30 , r3 : 10} (i.e. out of 90 requests - 50 are of r1
type, 30 are of r2 type and 10 are of r3 type).
As I an new to python distribution module, I am not getting how to
code this situation. Please help me out for the same.

Thanks in advance

Prince

Peter Otten · Mar 20, 2012

prince.pangeni said:
Hi all,
I am doing a simulation project using Python. In my project, I want
to use some short of distribution to generate requests to a server.
The request should have two distributions. One for request arrival
rate (should be poisson) and another for request mix (i.e. out of the
total requests defined in request arrival rate, how many requests are
of which type).
Example: Suppose the request rate is - 90 req/sec (generated using
poisson distribution) at time t and we have 3 types of requests (i.e.
r1, r2, r2). The request mix distribution output should be similar to:
{r1 : 50 , r2 : 30 , r3 : 10} (i.e. out of 90 requests - 50 are of r1
type, 30 are of r2 type and 10 are of r3 type).
As I an new to python distribution module, I am not getting how to
code this situation. Please help me out for the same.

You don't say what distribution module you're talking of, and I guess I'm
not the only one who'd need to know that detail.

However, with sufficient resolution and duration the naive approach sketched
below might be good enough.

# untested
DURATION = 3600 # run for one hour
RATE = 90 # requests/sec
RESOLUTION = 1000 # one msec

requests = ([r1]*50 + [r2]*30 + [r3]*10)
time_slots = [0]*(RESOLUTION*DURATION)
times = range(RESOLUTION*DURATION)

for _ in range(DURATION*RATE):
time_slots[random.choice(times)] += 1

for time, count in enumerate(time_slots):
for _ in range(count):
issue_request_at(random.choice(requests), time)

Robert Kern · Mar 20, 2012

What is a distribution? That term already means something in Python
jargon, and it doesn't match the rest of your use case.

So what do you mean by â€œdistributionâ€? Maybe we can find a less
confusing term.

Judging from the context, he means a probability distribution.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Robert Kern · Mar 20, 2012

Hi all,
I am doing a simulation project using Python. In my project, I want
to use some short of distribution to generate requests to a server.
The request should have two distributions. One for request arrival
rate (should be poisson) and another for request mix (i.e. out of the
total requests defined in request arrival rate, how many requests are
of which type).
Example: Suppose the request rate is - 90 req/sec (generated using
poisson distribution)

Just a note on terminology to be sure we're clear: a Poisson *distribution*
models the number of arrivals in a given time period if the events are from a
Poisson *process* with a given mean rate. To model the inter-event arrival
times, you use an exponential distribution. If you want to handle events
individually in your simulation, you will need to use the exponential
distribution to figure out the exact times for each. If you are handling all of
the events in each second "in bulk" without regard to the exact times or
ordering within that second, then you can use a Poisson distribution.

at time t and we have 3 types of requests (i.e.
r1, r2, r2). The request mix distribution output should be similar to:
{r1 : 50 , r2 : 30 , r3 : 10} (i.e. out of 90 requests - 50 are of r1
type, 30 are of r2 type and 10 are of r3 type).
As I an new to python distribution module, I am not getting how to
code this situation. Please help me out for the same.

I am going to assume that you want to handle each event independently. A basic
strategy is to keep a time variable starting at 0 and use a while loop until the
time reaches the end of the simulation time. Increment it using a draw from the
exponential distribution each loop. Each iteration of the loop is an event. To
determine the kind of event, you will need to draw from a weighted discrete
distribution. What you want to do here is to do a cumulative sum of the weights,
draw a uniform number from 0 to the total sum, then use bisect to find the item
that matches.

import bisect
import random

# Use a seeded PRNG for repeatability. Use the methods on the Random
# object rather than the functions in the random module.
prng = random.Random(1234567890)

avg_rate = 90.0 # reqs/sec

kind_weights = [50.0, 30.0, 10.0]
kind_cumsum = [sum(kind_weights[:i+1]) for i in range(len(kind_weights))]
kind_max = kind_cumsum[-1]

max_time = 10.0 # sec
t = 0.0 # sec
events = [] # (t, kind)
while t < max_time:
dt = prng.expovariate(avg_rate)
u = prng.uniform(0.0, kind_max)
kind = bisect.bisect_left(kind_cumsum, u)
events.append((t, kind))
t += dt

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Dennis Lee Bieber · Mar 20, 2012

Judging from the context, he means a probability distribution.

Futhermore, he explicitly mentions Poisson later in the text.

That should just be a case of finding the mathematical definition of
the distribution and coding something using random (since the random
module has gaussian/normal, but not Poisson -- maybe numpy/simpy?) to
obtain the time variant.

I suspect the real question is how to handle the weighted selection
of r1-r3. And for such a relatively short sample set (90 entries), the
chunked list shown previously, with randint to index into it is viable.

The alternative, more generalized (in a way), would be to use a
sorted list of the probabilities (or sums)... (pseudo-code)

probs = [ (numR1, r1),
(numR1 + numR2, r2),
(numR1 + numR2 + numR3, r3) ]
#obviously one is unlikely to actually create constants for the numRs
#more likely to create in line
probs.sort()

ran = random.randint(numR1 + numR2 + numR3)
for (p, r) in probs:
if ran > p: continue
theR = r

Laurent Claessens · Mar 20, 2012

Il 20/03/2012 12:21, Ben Finney ha scritto:
I guess scipy is also available in plain python (didn't check), but the
following works with Sage :

----------------------------------------------------------------------
| Sage Version 4.8, Release Date: 2012-01-20 |
| Type notebook() for the GUI, and license() for information. |
----------------------------------------------------------------------
sage: from scipy import stats
sage: X=stats.poisson.rvs
sage: X(4)
5
sage: X(4)
2
sage: X(4)
3

Hope it helps
Laurent

Dennis Lee Bieber · Mar 20, 2012

module has gaussian/normal, but not Poisson -- maybe numpy/simpy?) to
obtain the time variant.

Whoops -- scipy, not simpy

ANN: eGenix mx Base Distribution 3.2.6 (mxDateTime, mxTextTools, etc.)	0	Apr 17, 2013
Simple Processor VHDL Doubt	0	May 24, 2011
Help please	8	Jul 7, 2023
Minimum Total Difficulty	0	Nov 15, 2023
ANN: eGenix pyOpenSSL Distribution 0.10.0-1.0.0a	0	Jun 10, 2010
id functions of ints, floats and strings	6	Apr 3, 2008
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
[PAID][REMOTE] Hiring programmer/dev for indie game	2	Feb 19, 2023

Distribution

prince.pangeni

Peter Otten

Robert Kern

Robert Kern

Dennis Lee Bieber

Laurent Claessens

Dennis Lee Bieber

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads