Verifying a list is alphabetized

M

Martin Gregorie

Tony Hoare, being a scholar and a gentleman, made a strong but measured
claim about it (from the conclusion of [1]):
This was in "Software Tools in Pascal" (Kernighan & Plauger).
I very much doubt that anyone worth listening to has ever made a claim
that quicksort is the fastest in every situation. Pathological cases
have been known for such a long time that to do so would be to court
honorary membership of the Flat Earth Society.
K&P describe shellsort as approximating N**1.5 and say that it averages
NlogN but can degrade to n**2 but that worse than NlogN is 'very rare',
addinf that to sort an array "just enter 'quick(v, 1, n)' and stand well
back". I probably ascribed too much weight to that remark, but OTOH the
book says almost nothing about what can degrade its performance.

It then goes on to talk about sort-merges which I really related to since
I was using a box with 48K RAM at the time and wasn't long away from
mainframes with little RAM (multi-user operation with 32 Kwords of RAM:
96 KB equivalent), so all disk sort utilities were of the sort-merge
variety - tape sorts were pure multi-way merges preceeded by a collation
phase).
 
M

markspace

One practical use of such a data set would be to bring home to algorithm
design students and programmers choosing algorithms the difference
between "average O(n log(n) time" and "O(n log(n)) time", which by
default means a bound on the maximum time across all input sets.


I think you're missing a parenthesis in the first formula there.

So let me take a stab at that. "Average O" time means, given a bunch of
different data, we think that some sort of "average" of them will have
an upper bound of O, but some might not. And "average" means "average
of the data we measured," not "the data you're using."

Whereas "O" means the upper bound is O, period, regardless of the data.
 
M

markspace

Usually, when it is quoted in Big-O notation it should mean "calculated
average over all data sets". Big-O is not really subject to
experimentation, because it is about limits as size tends to infinity.


Really. I'll have to take another look at the math then (not sure where
my math books are right now). I didn't think this could actually be
worked out, with any rigor or precision. As you say the data sets
"tends to infinity." It'll be interesting to review those calculations.
 
E

Eric Sosman

[...]
Usually, when it is quoted in Big-O notation it should mean "calculated
average over all data sets".

I must disagree. Big-Oh is a statement about some quantity, and
carries no implication about the nature of that quantity: average,
best case, worst case, or whatever. We can say of Quicksort that the
average running time is O(N log N) and the worst is O(N*N), and the
big-Ohs in the two assertions are independent: They are statements
about two different quantities.

In fact, Big-Oh is often used in an entirely different context,
describing a tiny quantity rather than a large one. For example, we
can say that the sum 1/1 + 1/2 + ... + 1/N is ln(N) + O(1). There is
no implication of an average of any kind in this statement, nor even
of a data set or a universe of data sets; there is only a statement
about the rate of growth of the discrepancy between the exact sum
and the approximation.

If notions like "average over all data sets" appear, they do not
arise from the Big-Oh but from the nature of the quantity that Big-Oh
describes. If we're talking about "average," we must describe the
genesis of this average: Over the entire universe of cases or over a
designated subset, weighted or unweighted, time-decayed or static,
whatever. It's not Big-Oh that contributes such things, but the
definition of the quantity under consideration.
Big-O is not really subject to
experimentation, because it is about limits as size tends to infinity.

Or, sometimes, as something tends to zero. You'll find this sense
used all over the place in any textbook on numerical analysis: Integrate
by the trapezoidal rule with step size h and the error is O(h^2); use
Simpson's rule and you can get O(h^4). These Big-Oh's still describe
upper bounds, but the quantities become smaller rather than larger in
the direction of interest.
 
G

Gene Wirchenko

IIRC QuickSort goes a bit bananas if the file is already sorted. ISTR
some sort doing some random swaps at the start to make sure that does
not happen. It does do a check first if things are in order.

Ordered either way. If you try to sort something that is in
reverse order, it is also a pathological case. This makes Quicksort
O(n squared) because of this worst case.
Most of the time when I sort things, they are already sorted. Nothing
has changed since the last sort, or when I manually edit files I
naturally keep them in order.

Then an insertion sort will do.
You also check inorder in an assert, not as a prelude to sorting. If
it is not in order, something went wrong.

It depends on your needs.

Sincerely,

Gene Wirchenko
 
R

Roedy Green


I coded a bunch of sorts with test drivers if you are interested in
comparing them under various circumstances:
http://mindprod.com/products2.html#HEAPSORT HeapSort
http://mindprod.com/products2.html#QUICKSORT QuickSort
http://mindprod.com/products2.html#RADIXSORT RadixSort
http://mindprod.com/products2.html#SHELLSORT ShellSort
http://mindprod.com/jgloss/sort.html#BADSORT BubbleSort
--
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,602
Members
45,182
Latest member
BettinaPol

Latest Threads

Top