[QUIZ] Sampling (#39)

G

Graeme Defty

Sorry, all. New to the list. I accidentally posted my
previous under a nonsense user id :)
=20
I think you are wrong. A rough estimates of this
probability gives me
1e-2171438.
Actually, it should be a bit smaller, but at least
the order of
magnitude of the order of magnitude is right :)
Quite impressive...
=20
Paolo
=20
=20
I think he is wrong too (sorry. Missed the previous
post, so I don't know who it was) but I think I think
it's wrong for a different reason.

I got the probability of getting exactly one number in
a specific interval to be as follows:
Probability of one *specific* number in the interval
1/20
Probability of all the other 19 OUT of that interval
(19/20)**19
But we have 20 specific numbers to consider, so:
20 * 1/20 * (19/20)**19
or
(19.0/20)**19
which gives a remarkably similar 0.3773536 (don't know
if this is a co-incidence. My brain hurts :)

Where I think this goes wrong is in assuming that
expanding this to 20 intervals just needs a **20.=20

After we get exactly one number in the first interval,
we then are looking for exactly one number in the
second interval, which is only one interval out of 19.
In other words, the probability should
be(18.0/19)**18. and so forth. The overall probability
therefore is
(19.0/20)**19 * (18.0/19)**18 * ... * (2.0/1)**1
which my RubyShell informs me is 0.377353602535307e-8,
or thereabouts :)

Graeme

Send instant messages to your online friends http://uk.messenger.yahoo.co=
m=20
 
O

Olaf Klischat

Graeme Defty said:
Sorry, all. New to the list. I accidentally posted my
previous under a nonsense user id :)

I think he is wrong too (sorry. Missed the previous
post, so I don't know who it was)
Me.

but I think I think
it's wrong for a different reason.

I got the probability of getting exactly one number in
a specific interval to be as follows:
Probability of one *specific* number in the interval
1/20
Probability of all the other 19 OUT of that interval
(19/20)**19
But we have 20 specific numbers to consider, so:
20 * 1/20 * (19/20)**19
or
(19.0/20)**19

Right, that's what I did too, but I was talking about a 200-numbers
interval, not a 20-numbers one :)
which gives a remarkably similar 0.3773536 (don't know
if this is a co-incidence.

It isn't. ((x-1)/x)**(x-1) converges for x->Infinity.

= lim_{x->\inf} ((x-1)/x)**(x-1)
= lim_{x->\inf} (1-1/x)**(x-1)
= lim_{x->\inf} (1-1/x)**x * lim_{x->\inf} (1-1/x)**-1
= lim_{x->\inf} (1-1/x)**x * 1
= lim_{x->\inf} (1-1/x)**x
= lim_{x->\inf} (1/(1+1/x))**x
= lim_{x->\inf} 1/((1+1/x)**x)
= 1 / lim_{x->\inf} ((1+1/x)**x)
= 1/E
= 0.3678794...

How about that :)
Where I think this goes wrong is in assuming that
expanding this to 20 intervals just needs a **20.

After we get exactly one number in the first interval,
we then are looking for exactly one number in the
second interval, which is only one interval out of 19.
In other words, the probability should
be(18.0/19)**18. and so forth. The overall probability
therefore is
(19.0/20)**19 * (18.0/19)**18 * ... * (2.0/1)**1
which my RubyShell informs me is 0.377353602535307e-8,
or thereabouts :)

I didn't get that. You have 20 given (previously selected) intervals,
and you want the probability that a sampling contains exactly one
number from each of them. That gives (with your interval size)
p=(19/20)**19 for the first given interval, (19/20)**19 for the second
one and so on. ((19/20)**19)**20 in total.

Olaf
 
P

Paolo Capriotti

=20
Right, that's what I did too, but I was talking about a 200-numbers
interval, not a 20-numbers one :)

It's wrong anyways. You both are assuming that all these events are
independent, and multiply their probability, but they are not. One
neat way to see this is remarking that you didn't use the fact that
the intervals are 5000000. If there were only one such interval (that
is, if you were doing an 1 out of 200 sampling), the probability would
have been obviously 1. If your formula were correct, I could apply it
to this case, obtaining a contradiction.

Paolo
 
O

Olaf Klischat

Paolo Capriotti said:
It's wrong anyways. You both are assuming that all these events are
independent, and multiply their probability, but they are not. One
neat way to see this is remarking that you didn't use the fact that
the intervals are 5000000. If there were only one such interval (that
is, if you were doing an 1 out of 200 sampling), the probability would
have been obviously 1. If your formula were correct, I could apply it
to this case, obtaining a contradiction.

Yeah. You have a point there ;).

I still think that 0.3688 and 0.3688**20 are not far off because the
interval size (200) and number (20) is much smaller than MEMBERS and
LIMIT, so the occurance of an event like "the sampling contains one
number from a given (preselected) interval" does not influence the
probability of another such event (for another, disjunct interval)
very much. Right?
 
O

Olaf Klischat

Olaf Klischat said:
Yeah. You have a point there ;).

I still think that 0.3688 and 0.3688**20 are not far off because the
interval size (200) and number (20) is much smaller than MEMBERS and
LIMIT, so the occurance of an event like "the sampling contains one
number from a given (preselected) interval" does not influence the
probability of another such event (for another, disjunct interval)
very much. Right?

I've pondered some more and I think, considering "event dependency"
(i.e. changed probabilities once you've selected some numbers), the
actual probability that exactly one number occurs in a given
200-numbers interval isn't (199/200)**199=0.36880183, but
(1-4_999_999/999_999_999)*
(1-4_999_999/999_999_998)*
(1-4_999_999/999_999_997)*
..
..
.. *
(1-4_999_999/999_999_801)
= 0.368801868

(so the approximation has an error of only 1e-7), or in the general
case:

# probability that exactly one number occurs in a given interval of
# size intervalsize in a sampling with parameters members and limit
def p(members,limit,intervalsize)
(1...intervalsize).inject(intervalsize * members/limit){|x,i|
x * (1-(members-1)/(limit-i))
}
end


I've now written a "check" script that gets a sample on stdin and
determines this value (all consecutive intervals of size limit/members
are checked), and compares it to the theoretically expected one:


--------------------------------
#!/usr/bin/ruby

# probability that exactly one number occurs in a given interval of
# size intervalsize in a sampling with parameters members and limit
def p(members,limit,intervalsize)
(1...intervalsize).inject(intervalsize * members/limit){|x,i|
x * (1-(members-1)/(limit-i))
}
end

members,limit = ARGV.map{|x|x.to_i}
warn "#{limit} mod #{members} != 0 -- inaccurate results" if limit%members!=0

interval = limit/members

puts "expected: #{p(members.to_f,limit.to_f,interval)}"

found=0
begin
last=0; count=0;
loop do
num=STDIN.readline.to_i
new=num/interval
if new==last then
count+=1
else
last=new
found+=1 if count==1
count=1
end
end
rescue EOFError
end
found+=1 if count==1

puts "actual: #{found.to_f/(limit/interval)}"
--------------------------------



For the big_sample.txt here, this gives:

olaf@tack:~/doc/mydocs/ruby/quiz$ ./check 5_000_000 1_000_000_000 <big_sample.txt
expected: 0.368801867760757
actual: 0.368865
olaf@tack:~/doc/mydocs/ruby/quiz$


Other runs:

olaf@tack:~/doc/mydocs/ruby/quiz$ ./sample 5000 70000 | ./check 5000 70000
expected: 0.381630036177205
actual: 0.386
olaf@tack:~/doc/mydocs/ruby/quiz$

olaf@tack:~/doc/mydocs/ruby/quiz$ ./sample 1 100 | ./check 1 100
expected: 1.0
actual: 1.0
olaf@tack:~/doc/mydocs/ruby/quiz$


The "cheated" samples that have one value in each interval will always
yield actual: 1.0.

All things considered, this seems to be quite a good indicator to tell
the really random solutions from the not-so-random ones :)

Olaf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top