[QUIZ] Sampling (#39)

Graeme Defty · Jul 17, 2005

Sorry, all. New to the list. I accidentally posted my
previous under a nonsense user id

=20
I think you are wrong. A rough estimates of this
probability gives me
1e-2171438.
Actually, it should be a bit smaller, but at least
the order of
magnitude of the order of magnitude is right
Quite impressive...
=20
Paolo
=20
=20

I think he is wrong too (sorry. Missed the previous
post, so I don't know who it was) but I think I think
it's wrong for a different reason.

I got the probability of getting exactly one number in
a specific interval to be as follows:
Probability of one *specific* number in the interval
1/20
Probability of all the other 19 OUT of that interval
(19/20)**19
But we have 20 specific numbers to consider, so:
20 * 1/20 * (19/20)**19
or
(19.0/20)**19
which gives a remarkably similar 0.3773536 (don't know
if this is a co-incidence. My brain hurts

Where I think this goes wrong is in assuming that
expanding this to 20 intervals just needs a **20.=20

After we get exactly one number in the first interval,
we then are looking for exactly one number in the
second interval, which is only one interval out of 19.
In other words, the probability should
be(18.0/19)**18. and so forth. The overall probability
therefore is
(19.0/20)**19 * (18.0/19)**18 * ... * (2.0/1)**1
which my RubyShell informs me is 0.377353602535307e-8,
or thereabouts

Graeme

Send instant messages to your online friends http://uk.messenger.yahoo.co=
m=20

Olaf Klischat · Jul 17, 2005

Graeme Defty said:
Sorry, all. New to the list. I accidentally posted my
previous under a nonsense user id

I think he is wrong too (sorry. Missed the previous
post, so I don't know who it was)
Me.

but I think I think
it's wrong for a different reason.

I got the probability of getting exactly one number in
a specific interval to be as follows:
Probability of one *specific* number in the interval
1/20
Probability of all the other 19 OUT of that interval
(19/20)**19
But we have 20 specific numbers to consider, so:
20 * 1/20 * (19/20)**19
or
(19.0/20)**19

Right, that's what I did too, but I was talking about a 200-numbers
interval, not a 20-numbers one

which gives a remarkably similar 0.3773536 (don't know
if this is a co-incidence.

It isn't. ((x-1)/x)**(x-1) converges for x->Infinity.

= lim_{x->\inf} ((x-1)/x)**(x-1)
= lim_{x->\inf} (1-1/x)**(x-1)
= lim_{x->\inf} (1-1/x)**x * lim_{x->\inf} (1-1/x)**-1
= lim_{x->\inf} (1-1/x)**x * 1
= lim_{x->\inf} (1-1/x)**x
= lim_{x->\inf} (1/(1+1/x))**x
= lim_{x->\inf} 1/((1+1/x)**x)
= 1 / lim_{x->\inf} ((1+1/x)**x)
= 1/E
= 0.3678794...

How about that

Where I think this goes wrong is in assuming that
expanding this to 20 intervals just needs a **20.

After we get exactly one number in the first interval,
we then are looking for exactly one number in the
second interval, which is only one interval out of 19.
In other words, the probability should
be(18.0/19)**18. and so forth. The overall probability
therefore is
(19.0/20)**19 * (18.0/19)**18 * ... * (2.0/1)**1
which my RubyShell informs me is 0.377353602535307e-8,
or thereabouts

I didn't get that. You have 20 given (previously selected) intervals,
and you want the probability that a sampling contains exactly one
number from each of them. That gives (with your interval size)
p=(19/20)**19 for the first given interval, (19/20)**19 for the second
one and so on. ((19/20)**19)**20 in total.

Olaf

Paolo Capriotti · Jul 17, 2005

=20
Right, that's what I did too, but I was talking about a 200-numbers
interval, not a 20-numbers one

It's wrong anyways. You both are assuming that all these events are
independent, and multiply their probability, but they are not. One
neat way to see this is remarking that you didn't use the fact that
the intervals are 5000000. If there were only one such interval (that
is, if you were doing an 1 out of 200 sampling), the probability would
have been obviously 1. If your formula were correct, I could apply it
to this case, obtaining a contradiction.

Paolo

Olaf Klischat · Jul 17, 2005

Paolo Capriotti said:
It's wrong anyways. You both are assuming that all these events are
independent, and multiply their probability, but they are not. One
neat way to see this is remarking that you didn't use the fact that
the intervals are 5000000. If there were only one such interval (that
is, if you were doing an 1 out of 200 sampling), the probability would
have been obviously 1. If your formula were correct, I could apply it
to this case, obtaining a contradiction.

Yeah. You have a point there

.

I still think that 0.3688 and 0.3688**20 are not far off because the
interval size (200) and number (20) is much smaller than MEMBERS and
LIMIT, so the occurance of an event like "the sampling contains one
number from a given (preselected) interval" does not influence the
probability of another such event (for another, disjunct interval)
very much. Right?

Olaf Klischat · Jul 17, 2005

Olaf Klischat said:
Yeah. You have a point there .

I still think that 0.3688 and 0.3688**20 are not far off because the
interval size (200) and number (20) is much smaller than MEMBERS and
LIMIT, so the occurance of an event like "the sampling contains one
number from a given (preselected) interval" does not influence the
probability of another such event (for another, disjunct interval)
very much. Right?

I've pondered some more and I think, considering "event dependency"
(i.e. changed probabilities once you've selected some numbers), the
actual probability that exactly one number occurs in a given
200-numbers interval isn't (199/200)**199=0.36880183, but
(1-4_999_999/999_999_999)*
(1-4_999_999/999_999_998)*
(1-4_999_999/999_999_997)*
..
..
.. *
(1-4_999_999/999_999_801)
= 0.368801868

(so the approximation has an error of only 1e-7), or in the general
case:

# probability that exactly one number occurs in a given interval of
# size intervalsize in a sampling with parameters members and limit
def p(members,limit,intervalsize)
(1...intervalsize).inject(intervalsize * members/limit){|x,i|
x * (1-(members-1)/(limit-i))
}
end

I've now written a "check" script that gets a sample on stdin and
determines this value (all consecutive intervals of size limit/members
are checked), and compares it to the theoretically expected one:

--------------------------------
#!/usr/bin/ruby

# probability that exactly one number occurs in a given interval of
# size intervalsize in a sampling with parameters members and limit
def p(members,limit,intervalsize)
(1...intervalsize).inject(intervalsize * members/limit){|x,i|
x * (1-(members-1)/(limit-i))
}
end

members,limit = ARGV.map{|x|x.to_i}
warn "#{limit} mod #{members} != 0 -- inaccurate results" if limit%members!=0

interval = limit/members

puts "expected: #{p(members.to_f,limit.to_f,interval)}"

found=0
begin
last=0; count=0;
loop do
num=STDIN.readline.to_i
new=num/interval
if new==last then
count+=1
else
last=new
found+=1 if count==1
count=1
end
end
rescue EOFError
end
found+=1 if count==1

puts "actual: #{found.to_f/(limit/interval)}"
--------------------------------

For the big_sample.txt here, this gives:

olaf@tack:~/doc/mydocs/ruby/quiz$ ./check 5_000_000 1_000_000_000 <big_sample.txt
expected: 0.368801867760757
actual: 0.368865
olaf@tack:~/doc/mydocs/ruby/quiz$

Other runs:

olaf@tack:~/doc/mydocs/ruby/quiz$ ./sample 5000 70000 | ./check 5000 70000
expected: 0.381630036177205
actual: 0.386
olaf@tack:~/doc/mydocs/ruby/quiz$

olaf@tack:~/doc/mydocs/ruby/quiz$ ./sample 1 100 | ./check 1 100
expected: 1.0
actual: 1.0
olaf@tack:~/doc/mydocs/ruby/quiz$

The "cheated" samples that have one value in each interval will always
yield actual: 1.0.

All things considered, this seems to be quite a good indicator to tell
the really random solutions from the not-so-random ones

Olaf

[QUIZ] Sampling (#39)	91	Jul 15, 2005
Problem with codewars.	5	Dec 4, 2023
Minimum Total Difficulty	0	Nov 15, 2023
[SUMMARY] Sampling (#39)	0	Jul 21, 2005
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Taskcproblem calendar	4	Aug 31, 2023
Help figuring out a directory permission change problem	1	May 12, 2023
A number everyday of the month "and" a different number depending on the day of the month´s day time	2	Mar 16, 2021

[QUIZ] Sampling (#39)

Graeme Defty

Olaf Klischat

Paolo Capriotti

Olaf Klischat

Olaf Klischat

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads