File handling question

Eric Sosman · May 19, 2006

Richard said:
Joe Smith said:

Mr. Heathfield's algorithm [...]

Click to expand...

Um, I think it's in Knuth somewhere (although I had a look just now, and
couldn't find it in TAOCP2.)

Algorithm 3.4.2R is a more general version, selecting
a sample of size n possibly greater than 1. But be sure
to read the third edition of TAOCP II; the second edition
also has an Algorithm 3.4.2R for the same purpose, but it's
a different algorithm!

Old Wolf · May 19, 2006

Joe said:
I find no error in the induction. One of the reasons that I find no error
could be that I've all but forgotten conditional probability. If that is
sound, the telescoping looks correct. I've taken the liberty of sending
this to a professor with whom I'm acquainted who authored a book in
statistical paradoxes to ask for comment.

There is no error. This is a great example of how good induction is.

Another way of writing the inductive step is to note that all of
the first N lines have equal probability of being selected; and
selecting or not selecting the new last line does not affect this,
so the new probability of one of the first N lines remaining selected
is (1 - P(new line)) / N .

Eric Sosman · May 19, 2006

Old said:
There is no error. This is a great example of how good induction is.

Another way of writing the inductive step is to note that all of
the first N lines have equal probability of being selected; and
selecting or not selecting the new last line does not affect this,
so the new probability of one of the first N lines remaining selected
is (1 - P(new line)) / N .

The proof can also be done non-inductively, by considering
the probability that any particular line is selected. The Kth
line is chosen if and only if

- When it is read it goes into the "saved" buffer, and

- It survives in the "saved" buffer as all the remaining
lines are read.

The Kth line enters the "saved" buffer with probability 1/K,
survives the challenge from the next line with probability
K/(K+1), from the line after that with probability (K+1)/(K+2),
and so on, surviving the final line's challenge with probability
(N-1)/N. Multiply the probabilities together, and each denominator
except the last cancels with the numerator of the fraction right
after it, leaving 1/N as the probability that the Kth line is
chosen, for any 0 < K <= N.

The non-inductive proof is shorter, and may be more appealing
to some. To me, though, induction seems a more natural way to
prove things about iterative processes; the correspondence of
the inductive step with the iterative step is usually compelling.

Joe Smith · May 19, 2006

Old Wolf said:
There is no error. This is a great example of how good induction is.

Another way of writing the inductive step is to note that all of
the first N lines have equal probability of being selected; and
selecting or not selecting the new last line does not affect this,
so the new probability of one of the first N lines remaining selected
is (1 - P(new line)) / N .

I would call that more an appeal to strong induction, whereas Mr. Sosman's
is to weak induction. Of course, that wouldn't change the outcome, just
what is "known" during the inductive step. BTW, you were right elsewhere
that I was mistakenly talking about the Euclidean Algorithm as the Reverse.
Why you think public key cryptogography is OT here is beyond me, except that
that might mean you only discuss it on your Own Terms. joe

How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Error Handling	27	May 28, 2010
Noob File IO question	8	Apr 4, 2007
Errors When Pulling Information from CSV File to Python	0	Dec 10, 2020
Exception Handling	33	Mar 11, 2012
How to sort a CSV file with merge sort JAVA	7	May 6, 2021
Question about my projects	3	Jul 23, 2021
Handling different implementations of the same algorithm	1	Aug 10, 2011

File handling question

Eric Sosman

Old Wolf

Eric Sosman

Joe Smith

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads