Chris said:
Patricia Shanahan wrote:
However, in this case we are sorting ints, which are known
to be numbers in a bounded range, so a radix sort is
possible and is O(n).
Ah, yes. I should have thought of that myself (too used to thinking of int-s
as a reasonably approximation to integers -- silly).
Come to think of it, a bucket sort (if that's the right name) could do it too,
although the space overheads (2**32 elements-worth of int[] arrays) would make
it a little impractical. (Unless the range if ints were known to be more
severely limited than to just the 32-bit range).
But neither algorithm sounds as if it's the one that John was talking about --
unless the "uniform distribution" precondition he mentioned was to allow most
of the steps of a radix sort to do some useful partitioning.
I have lost the reference (and indeed, my test implementation), but the
sort in question is more or less an in-place bucket sort. First it
computes a histogram with dynamically-determined (but uniform) bucket
width (requiring two linear scans of the array), then it uses the
histogram to assist in a round-robin style element permutation so that
the elements of each histogram bucket end up consecutive in the array
and bucket-wise in the correct sort order (each element is moved at
most once), and finally it follows up with an insertion sort of the
nearly-sorted array (equivalently: an insertion sort of each bucket).
It's that last step that's the problem: if the array elements are
distributed roughly uniformly, then the number of elements for any
bucket (and hence the total time / comparisons for any bucket) can be
bounded by a small positive number, thus the insertion sort performs
linearly with respect to the total number of elements. If, however, the
distribution is highly non-uniform, then the O(N*N) complexity of the
insertion sort can kick in on one or a few buckets that contain the
majority of the elements.
The problem with non-uniform value distributions might be addressable by
using nonuniform bucket widths, but then I don't see how you assign a
value to its bucket in constant time (as required for linear scaling of
this algorithm) -- you'd have to perform some kind of lookup.
Well, there you are, probably more than you wanted to know.