user923005 wrote On 03/06/07 14:06,:
user923005 wrote On 03/06/07 13:12,:
[snip]
It's O(N). All the other solutions posed here are O(N*log(N))
Observation #3: This is wrong. For (counter-)example,
I suggested using a heap, whose time complexity would be
O(N*log(1000))=O(N).
Using a heap is not O(n). If you consider the set a fixed size (e.g.
one billion) and you consider the region of interest a fixed size
(e.g. 1000) then true, we can call it O(n). But in algorithmic terms
both of those things are variables. We could (in fact) bogosort the
whole mess and then pick the top 1000 items. Since 1e9 is a constant,
and if N is a constant, then N! is a constant -- we could say that the
whole thing is still O(1). Of course, it does not work like that.
You seem to be mixing up two different N's: the size
of the data set being scanned, and the size of the heap.
My suggestion (which I didn't spell out fully, because I
thought it was obvious) is to initialize by reading the
first 1000 numbers and forming a heap from them, then to
read the remaining numbers one at a time, sifting-up each
into the heap and discarding the smallest to keep the
heap size fixed at 1000. Taking N to mean the data size:
Initialization: O(log 1)+O(log 2)+...+O(log 1000) = O(1)
Each subsequent sift-up: O(log 1000) = O(1)
Number of sift-ups: N-1000 = O(N)
Total: O(N)*O(1) + O(1) = O(N)
The important point is that the heap size is fixed at
1000, so each heap operation takes O(1) time. If the
problem were to find the largest M of N numbers, then the
parameterization would be different. The heap approach
would then be characterized as
Initialization: O(log 1)+O(log 2)+...+O(log M) = O(1)
Each subsequent sift-up: O(log M)
Number of sift-ups: N-M
Total: (N-M)*O(log M) + O(1) = (N-M)*O(log M)
If M << N, this is O(N * log M)
.... but even here no O(N log N) term is present.