[QUIZ] Sampling (#39)

Discussion in 'Ruby' started by Graeme Defty, Jul 17, 2005.

  1. Graeme Defty

    Graeme Defty Guest

    Sorry, all. New to the list. I accidentally posted my
    previous under a nonsense user id :)

    > > Unless I'm mistaken, in the 5e6-from-1e9 sampling,

    > the probability
    > > that a sampling contains exactly one number from a

    > given 200-numbers
    > > interval is 200.0*(1/200)*(199.0/200)**199 =3D

    > 0.3688. The probability
    > > that this happens for 20 such 200-numbers

    > intervals is
    > > 0.3688**20 =3D 2.1e-09.

    >=20
    > I think you are wrong. A rough estimates of this
    > probability gives me
    > 1e-2171438.
    > Actually, it should be a bit smaller, but at least
    > the order of
    > magnitude of the order of magnitude is right :)
    > Quite impressive...
    >=20
    > Paolo
    >=20
    >=20

    I think he is wrong too (sorry. Missed the previous
    post, so I don't know who it was) but I think I think
    it's wrong for a different reason.

    I got the probability of getting exactly one number in
    a specific interval to be as follows:
    Probability of one *specific* number in the interval
    1/20
    Probability of all the other 19 OUT of that interval
    (19/20)**19
    But we have 20 specific numbers to consider, so:
    20 * 1/20 * (19/20)**19
    or
    (19.0/20)**19
    which gives a remarkably similar 0.3773536 (don't know
    if this is a co-incidence. My brain hurts :)

    Where I think this goes wrong is in assuming that
    expanding this to 20 intervals just needs a **20.=20

    After we get exactly one number in the first interval,
    we then are looking for exactly one number in the
    second interval, which is only one interval out of 19.
    In other words, the probability should
    be(18.0/19)**18. and so forth. The overall probability
    therefore is
    (19.0/20)**19 * (18.0/19)**18 * ... * (2.0/1)**1
    which my RubyShell informs me is 0.377353602535307e-8,
    or thereabouts :)

    Graeme

    Send instant messages to your online friends http://uk.messenger.yahoo.co=
    m=20
    Graeme Defty, Jul 17, 2005
    #1
    1. Advertising

  2. Graeme Defty <> writes:

    > Sorry, all. New to the list. I accidentally posted my
    > previous under a nonsense user id :)
    >
    >> > Unless I'm mistaken, in the 5e6-from-1e9 sampling,

    >> the probability
    >> > that a sampling contains exactly one number from a

    >> given 200-numbers
    >> > interval is 200.0*(1/200)*(199.0/200)**199 =

    >> 0.3688. The probability
    >> > that this happens for 20 such 200-numbers

    >> intervals is
    >> > 0.3688**20 = 2.1e-09.

    >>
    >> I think you are wrong. A rough estimates of this
    >> probability gives me
    >> 1e-2171438.
    >> Actually, it should be a bit smaller, but at least
    >> the order of
    >> magnitude of the order of magnitude is right :)
    >> Quite impressive...
    >>
    >> Paolo
    >>
    >>

    > I think he is wrong too (sorry. Missed the previous
    > post, so I don't know who it was)


    Me.

    > but I think I think
    > it's wrong for a different reason.
    >
    > I got the probability of getting exactly one number in
    > a specific interval to be as follows:
    > Probability of one *specific* number in the interval
    > 1/20
    > Probability of all the other 19 OUT of that interval
    > (19/20)**19
    > But we have 20 specific numbers to consider, so:
    > 20 * 1/20 * (19/20)**19
    > or
    > (19.0/20)**19


    Right, that's what I did too, but I was talking about a 200-numbers
    interval, not a 20-numbers one :)

    > which gives a remarkably similar 0.3773536 (don't know
    > if this is a co-incidence.


    It isn't. ((x-1)/x)**(x-1) converges for x->Infinity.

    = lim_{x->\inf} ((x-1)/x)**(x-1)
    = lim_{x->\inf} (1-1/x)**(x-1)
    = lim_{x->\inf} (1-1/x)**x * lim_{x->\inf} (1-1/x)**-1
    = lim_{x->\inf} (1-1/x)**x * 1
    = lim_{x->\inf} (1-1/x)**x
    = lim_{x->\inf} (1/(1+1/x))**x
    = lim_{x->\inf} 1/((1+1/x)**x)
    = 1 / lim_{x->\inf} ((1+1/x)**x)
    = 1/E
    = 0.3678794...

    How about that :)

    >
    > Where I think this goes wrong is in assuming that
    > expanding this to 20 intervals just needs a **20.
    >
    > After we get exactly one number in the first interval,
    > we then are looking for exactly one number in the
    > second interval, which is only one interval out of 19.
    > In other words, the probability should
    > be(18.0/19)**18. and so forth. The overall probability
    > therefore is
    > (19.0/20)**19 * (18.0/19)**18 * ... * (2.0/1)**1
    > which my RubyShell informs me is 0.377353602535307e-8,
    > or thereabouts :)


    I didn't get that. You have 20 given (previously selected) intervals,
    and you want the probability that a sampling contains exactly one
    number from each of them. That gives (with your interval size)
    p=(19/20)**19 for the first given interval, (19/20)**19 for the second
    one and so on. ((19/20)**19)**20 in total.

    Olaf
    Olaf Klischat, Jul 17, 2005
    #2
    1. Advertising

  3. On 7/17/05, Olaf Klischat <-berlin.de> wrote:
    > Graeme Defty <> writes:
    >=20
    > > Sorry, all. New to the list. I accidentally posted my
    > > previous under a nonsense user id :)
    > >
    > >> > Unless I'm mistaken, in the 5e6-from-1e9 sampling,
    > >> the probability
    > >> > that a sampling contains exactly one number from a
    > >> given 200-numbers
    > >> > interval is 200.0*(1/200)*(199.0/200)**199 =3D
    > >> 0.3688. The probability
    > >> > that this happens for 20 such 200-numbers
    > >> intervals is
    > >> > 0.3688**20 =3D 2.1e-09.
    > >>
    > >> I think you are wrong. A rough estimates of this
    > >> probability gives me
    > >> 1e-2171438.
    > >> Actually, it should be a bit smaller, but at least
    > >> the order of
    > >> magnitude of the order of magnitude is right :)
    > >> Quite impressive...
    > >>
    > >> Paolo
    > >>
    > >>

    > > I think he is wrong too (sorry. Missed the previous
    > > post, so I don't know who it was)

    >=20
    > Me.
    >=20
    > > but I think I think
    > > it's wrong for a different reason.
    > >
    > > I got the probability of getting exactly one number in
    > > a specific interval to be as follows:
    > > Probability of one *specific* number in the interval
    > > 1/20
    > > Probability of all the other 19 OUT of that interval
    > > (19/20)**19
    > > But we have 20 specific numbers to consider, so:
    > > 20 * 1/20 * (19/20)**19
    > > or
    > > (19.0/20)**19

    >=20
    > Right, that's what I did too, but I was talking about a 200-numbers
    > interval, not a 20-numbers one :)


    It's wrong anyways. You both are assuming that all these events are
    independent, and multiply their probability, but they are not. One
    neat way to see this is remarking that you didn't use the fact that
    the intervals are 5000000. If there were only one such interval (that
    is, if you were doing an 1 out of 200 sampling), the probability would
    have been obviously 1. If your formula were correct, I could apply it
    to this case, obtaining a contradiction.

    Paolo
    Paolo Capriotti, Jul 17, 2005
    #3
  4. Paolo Capriotti <> writes:

    > It's wrong anyways. You both are assuming that all these events are
    > independent, and multiply their probability, but they are not. One
    > neat way to see this is remarking that you didn't use the fact that
    > the intervals are 5000000. If there were only one such interval (that
    > is, if you were doing an 1 out of 200 sampling), the probability would
    > have been obviously 1. If your formula were correct, I could apply it
    > to this case, obtaining a contradiction.


    Yeah. You have a point there ;).

    I still think that 0.3688 and 0.3688**20 are not far off because the
    interval size (200) and number (20) is much smaller than MEMBERS and
    LIMIT, so the occurance of an event like "the sampling contains one
    number from a given (preselected) interval" does not influence the
    probability of another such event (for another, disjunct interval)
    very much. Right?
    Olaf Klischat, Jul 17, 2005
    #4
  5. Olaf Klischat <-berlin.de> writes:

    > Paolo Capriotti <> writes:
    >
    >> It's wrong anyways. You both are assuming that all these events are
    >> independent, and multiply their probability, but they are not. One
    >> neat way to see this is remarking that you didn't use the fact that
    >> the intervals are 5000000. If there were only one such interval (that
    >> is, if you were doing an 1 out of 200 sampling), the probability would
    >> have been obviously 1. If your formula were correct, I could apply it
    >> to this case, obtaining a contradiction.

    >
    > Yeah. You have a point there ;).
    >
    > I still think that 0.3688 and 0.3688**20 are not far off because the
    > interval size (200) and number (20) is much smaller than MEMBERS and
    > LIMIT, so the occurance of an event like "the sampling contains one
    > number from a given (preselected) interval" does not influence the
    > probability of another such event (for another, disjunct interval)
    > very much. Right?


    I've pondered some more and I think, considering "event dependency"
    (i.e. changed probabilities once you've selected some numbers), the
    actual probability that exactly one number occurs in a given
    200-numbers interval isn't (199/200)**199=0.36880183, but
    (1-4_999_999/999_999_999)*
    (1-4_999_999/999_999_998)*
    (1-4_999_999/999_999_997)*
    ..
    ..
    .. *
    (1-4_999_999/999_999_801)
    = 0.368801868

    (so the approximation has an error of only 1e-7), or in the general
    case:

    # probability that exactly one number occurs in a given interval of
    # size intervalsize in a sampling with parameters members and limit
    def p(members,limit,intervalsize)
    (1...intervalsize).inject(intervalsize * members/limit){|x,i|
    x * (1-(members-1)/(limit-i))
    }
    end


    I've now written a "check" script that gets a sample on stdin and
    determines this value (all consecutive intervals of size limit/members
    are checked), and compares it to the theoretically expected one:


    --------------------------------
    #!/usr/bin/ruby

    # probability that exactly one number occurs in a given interval of
    # size intervalsize in a sampling with parameters members and limit
    def p(members,limit,intervalsize)
    (1...intervalsize).inject(intervalsize * members/limit){|x,i|
    x * (1-(members-1)/(limit-i))
    }
    end

    members,limit = ARGV.map{|x|x.to_i}
    warn "#{limit} mod #{members} != 0 -- inaccurate results" if limit%members!=0

    interval = limit/members

    puts "expected: #{p(members.to_f,limit.to_f,interval)}"

    found=0
    begin
    last=0; count=0;
    loop do
    num=STDIN.readline.to_i
    new=num/interval
    if new==last then
    count+=1
    else
    last=new
    found+=1 if count==1
    count=1
    end
    end
    rescue EOFError
    end
    found+=1 if count==1

    puts "actual: #{found.to_f/(limit/interval)}"
    --------------------------------



    For the big_sample.txt here, this gives:

    olaf@tack:~/doc/mydocs/ruby/quiz$ ./check 5_000_000 1_000_000_000 <big_sample.txt
    expected: 0.368801867760757
    actual: 0.368865
    olaf@tack:~/doc/mydocs/ruby/quiz$


    Other runs:

    olaf@tack:~/doc/mydocs/ruby/quiz$ ./sample 5000 70000 | ./check 5000 70000
    expected: 0.381630036177205
    actual: 0.386
    olaf@tack:~/doc/mydocs/ruby/quiz$

    olaf@tack:~/doc/mydocs/ruby/quiz$ ./sample 1 100 | ./check 1 100
    expected: 1.0
    actual: 1.0
    olaf@tack:~/doc/mydocs/ruby/quiz$


    The "cheated" samples that have one value in each interval will always
    yield actual: 1.0.

    All things considered, this seems to be quite a good indicator to tell
    the really random solutions from the not-so-random ones :)

    Olaf
    Olaf Klischat, Jul 17, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ruby Quiz

    [QUIZ] Animal Quiz (#15)

    Ruby Quiz, Jan 14, 2005, in forum: Ruby
    Replies:
    11
    Views:
    345
    James Edward Gray II
    Jan 18, 2005
  2. David Tran
    Replies:
    9
    Views:
    173
    David Tran
    Jan 21, 2005
  3. Ruby Quiz

    [QUIZ] 1-800-THE-QUIZ (#20)

    Ruby Quiz, Feb 18, 2005, in forum: Ruby
    Replies:
    15
    Views:
    314
    gabriele renzi
    Feb 24, 2005
  4. Ruby Quiz

    [QUIZ] Sampling (#39)

    Ruby Quiz, Jul 15, 2005, in forum: Ruby
    Replies:
    91
    Views:
    768
    Stefan Mahlitz
    Jul 19, 2005
  5. hp gobelli

    [SOLUTION] [QUIZ] Sampling (#39)

    hp gobelli, Jul 18, 2005, in forum: Ruby
    Replies:
    1
    Views:
    88
    Yohanes Santoso
    Jul 18, 2005
Loading...

Share This Page