[snips]
H_n = O(ln n)
... which says the nth harmonic number is approximately
proportional to the log of n. It might be a million
times the log of n, or it might be the log of n plus
a million; all we're told is that it's approximately
proportional, with an unknown proportionality constant
and unknown contributions from lower-order terms.
Correct, and this is the normal way to use such notation.
Consider the case where there is a "startup cost" of a million (eg it's
O(n)+1000000). Now determine the overhead when one item is involved...
when a thousand are involved... when a million are involved... when a
trillion are involved.
As the numbers increase, so too does the runtime, but the "startup cost"
remains constant. The algorithm is O(N); its runtime varies in proportion
to N, with the constant becoming insignificant as N increases.
Or your other example, of a million times the log of N. Again, as N
increases, the import of the million times the log of N decreases,
becoming totally insignificant for large N.
I suspect the disagreement here is over whether one is discussing this in
terms of algorithmic complexity, or actual "cost". In terms of
algorithmic complexity, the factor of the constant is irrelevant and
doesn't apply; in terms of actual cost, one needs to factor it in.
One might examine this in terms of doing something such as indexing large
amounts of data off a remote server, when the process requires caching the
data locally in order to index it: the algorithmic complexity determines
which indexing system to use, but the "startup cost" - the added constant
- determines the best way to get the data here and imposes a fundamental
limit to the performance of the operation regardless of algorithm
efficiency.
Back to programming: Can you explain why Quicksort
is usually said to be faster than Heapsort, even though both have
asymptotic average running times of O(n log n)? If you cannot, what
reason do you have to prefer one over the other?[*]
Because heap sort has a larger constant; for a given number of elements N,
quicksort runs faster than heapsort, all else being equal.
Both, of course, are O(n log n) algorithms, as opposed to O(n) or O(n^2),
although IIRC, quick sort is provably susceptible to O(n^2) degeneration.
[*] Assuming for simplicity's sake that your choice
is based solely on expected running time; there are, of course, many
other things an engineer must consider in an actual situation.
If my choice is expected running time, I'd have to choose heapsort, as
its worst case is O(n log n), unless I'm working with sufficiently small,
or sufficiently well-defined data sets that O(n^2) degeneration doesn't
affect the outcome significantly enough to matter, or it can be proven
that quicksort can be designed to avoid degenerate behaviour in all cases.