[OT] stable algorithm with complexity O(n)

S

Steven D'Aprano

I agree. Most of his examples were tautologies. The magnet one was the
exception.

Then, to beat the O( n lg n ) limit, just break an assumption.

No no no, that's solving a different problem! You're welcome to solve a
different problem if you like, and sometimes doing so is very valuable:
sometimes the original problem turns out not to be important after all.
But don't pretend that it's the same problem.

Make a three-way comparison, make a non-comparison sort, or make non-
random inputs.

Again no. Multi-way comparisons merely change the base of the logarithm,
which in turn merely changes the constant term. It's still O(n*log n).
Non-comparison sorts are a useful technique, but it's changing the
problem, and they are only useful in very limited circumstances. There's
a good reason that most sort routines are based on O(n*log n) comparison
sorts instead of O(n) bucket sorts or radix sorts.

And the quip about non-random inputs just shows you don't understand the
problem you're being asked to solve. O(n*log n) is for the least number
of comparisons needed for the worst possible input. It's not for the best
possible input, or random input, or the average number of comparisons, or
any other measurement. You've misunderstood the question if you think
"give better input" is the answer.

And frankly, it's *easy* to come up with a comparison sort that has O(n)
behaviour for the best possible input. Bubble sort has O(n) behaviour for
input that is already sorted. Even the infamous bogosort has O(n)
behaviour for the best possible input:

def is_sorted(alist):
"""Return True if alist is sorted, otherwise False."""
for i in xrange(1, len(alist)):
if alist[i-1] > alist:
return False
return True

def bogosort(alist):
# Shuffle alist until it happens to be sorted
while not is_sorted(alist):
random.shuffle(alist)


I believe bogosort has behaviour O(n!), which is far worse than merely
exponential O(2**n) complexity. But when given already sorted input, the
version of bogosort above is O(n). Even the archtypically Awful (not just
Bad) algorithm is amazingly fast if you happen to give it the best
possible input.


(By the way, the above code is not suitable for use in production code.
Due to limitations of the standard Python random number generator, not
all permutations of lists larger than a certain size can be generated.
The default random number generator for Python 2.5 is the Mersenne
Twister, which has a period of 2**19937-1. But the number of permutations
of n items is n!, and n! is greater than 2**19937-1 for n=2081 or more.
That means that for any list with 2081 or more items, there is a finite
probability that the above implementation of bogosort will merely cycle
repeatedly over 2**19937-1 unsorted permutations and hence never
terminate.)
 
P

pruebauno

Non-comparison sorts are a useful technique, but it's changing the
problem, and they are only useful in very limited circumstances. There's
a good reason that most sort routines are based on O(n*log n) comparison
sorts instead of O(n) bucket sorts or radix sorts.
This is an assumption that I never quite understood. What most people
want is to have sorted data, they don't care if I used a sorting or
non-sorting comparison to do it. I think it is just that in most cases
n is not very big anyway and comparison sorts make it easier on the
programmer to create arbitrary types that are sortable.
 
P

pruebauno

This is an assumption that I never quite understood. What most people
want is to have sorted data, they don't care if I used a sorting or
non-sorting comparison to do it. I think it is just that in most cases
n is not very big anyway and comparison sorts make it easier on the
programmer to create arbitrary types that are sortable.

I meant they don't care if I use a comparison or non-comparison sort
of course.
 
D

Dan Upton

This is an assumption that I never quite understood. What most people
want is to have sorted data, they don't care if I used a sorting or
non-sorting comparison to do it. I think it is just that in most cases
n is not very big anyway and comparison sorts make it easier on the
programmer to create arbitrary types that are sortable.

And if n is small and sparse (ie, k > n) , O(k*n) for radix sort could
be worse than O(n^2). You could also ask why people make such a big
deal about quicksort over mergesort, since mergesort has a guaranteed
O(n log n) time whereas quicksort can be O(n^2) on pathological cases.

I think I remember learning in my algorithms class that for small
sorts (n < ~40) , bubblesort can actually be the fastest (or close to
the fastest) in terms of wall-clock time because it has a relatively
small constant factor in its O(n^2) complexity.
 
T

Terry Reedy

Dan said:
And if n is small and sparse (ie, k > n) , O(k*n) for radix sort could
be worse than O(n^2). You could also ask why people make such a big
deal about quicksort over mergesort, since mergesort has a guaranteed
O(n log n) time whereas quicksort can be O(n^2) on pathological cases.

Python's current list.sort uses mergesort because it better exploits
existing structure in a list.
I think I remember learning in my algorithms class that for small
sorts (n < ~40) , bubblesort can actually be the fastest (or close to
the fastest) in terms of wall-clock time because it has a relatively
small constant factor in its O(n^2) complexity.

It uses binary insert sort for n < 64 (chosen empirically) . That also
does O(n logn) comparisons and is only O(n**2) for data movement, which
a decent C compiler translates into fast block-move assembly instructions.

tjr
 
C

cmdrrickhunter

Just because its such an interesting problem, I'll take a stab at it.

It can be proven that you cannot sort an arbitrarily large set of
numbers, given no extra information, faster than O(n log n). It is
provable using information theory. However, if your teacher is giving
you evil problems, there's no reason you can't be evil in return:

Evil trick 1: Allocate an array of n^2 booleans, initialized to false
(This is O(n^2)). Declare this to be "before the sort" Then iterate
through the list and set each matching entry in the array to True
(This is O(n)). Now you have them "sorted." To get access to the
data, you need to iterate across the array O(n^2), but this is "after
the sort"

Evil trick 2: inserting into a set is O(1), so you could insert n
items into a set. This is O(n). Then you can argue that, since the
set cares not about order, it is as good as ordered!

Evil trick 3: pull up your python debugger, and have your program
search for the right answer in your teacher's test library. If done
smart, this could even be an O(1) sort o_O (queue Nukees reference:
"Welcome to my computer security course. Your grades have already
been entered into the school's grading systems as Fs. You have one
semester to change that, good luck")

.... these are fun =)
 
D

denisbz

It can be proven that you cannot sort an arbitrarily large set of
numbers, given no extra information, faster than O(n log n).

Cormen Leiserson and Rivest, "Algorithms", have a short clear chapter
on
"Sorting in linear time":
" ... counting sort, radix sort and bucket sort ... use operations
other than comparisons.
Consequently, the Omega( n lg n ) lower bound does not apply to
them."

Some of the book is in books.google.com; enjoy
 
D

Dan Upton

Cormen Leiserson and Rivest, "Algorithms", have a short clear chapter
on
"Sorting in linear time":
" ... counting sort, radix sort and bucket sort ... use operations
other than comparisons.
Consequently, the Omega( n lg n ) lower bound does not apply to
them."

But the key here is "given no extra information." Your examples of
non-comparison sorts require extra information, such as knowledge
about the possible range the numbers can take on.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top