Did the sort do anything?

Roedy Green · May 10, 2011

Often you sort things when they are already sorted.

I am interested in simple algorithms to detect whether the sort
actually did anything.

Some suggestions:

1. do a pairwise compare of the times before the sort, and if all is
in order, bypass the sort.

2. back a copy of the unsorted list of items. After the sort, do a
pairwise compare for identity. If all are identical, the sort did not
do anything.

3. write your own sort that has a boolean function you can ask if it
moved anything.

4. do some sort of checksum before and after.

--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.

markspace · May 10, 2011

Often you sort things when they are already sorted.

I am interested in simple algorithms to detect whether the sort
actually did anything.

Some suggestions:

1. do a pairwise compare of the times before the sort, and if all is
in order, bypass the sort.

This one here is probably the best idea of those you list. Note that
you'll save a bit of time avoiding sorts with the current merge sort
that Java uses, but a Tim sort (soon to be the standard sort in Java)
does this already, so in the long run you'll just add an O(n) to your sorts.

If possible, you should get on the Java developer list (it requires a
full release though) and make some suggestions. Returning a boolean for
"I did something" is not a terrible idea. I also see occasionally
people ask for a way to keep to disparate lists in sync when sorting.
This might be useful too. I.e., provide a way to override the "swap"
function in the sort.

Andreas Leitgeb · May 10, 2011

Roedy Green said:
Often you sort things when they are already sorted.
I am interested in simple algorithms to detect whether the sort
actually did anything.
Some suggestions:
1. do a pairwise compare of the times before the sort, and if all is
in order, bypass the sort.
2. back a copy of the unsorted list of items. After the sort, do a
pairwise compare for identity. If all are identical, the sort did not
do anything.
3. write your own sort that has a boolean function you can ask if it
moved anything.
4. do some sort of checksum before and after.

5. wrap the Collection such, that certain modifications set an "unsorted"-
flag, e.g. if an element is inserted/replaced/appended that doesn't compare
in the intended way with its neighbours. Then shortcut the sort, if the said
flag is clear.

Daniele Futtorovic · May 10, 2011

Often you sort things when they are already sorted.

I am interested in simple algorithms to detect whether the sort
actually did anything.

Some suggestions:

1. do a pairwise compare of the times before the sort, and if all is
in order, bypass the sort.

2. back a copy of the unsorted list of items. After the sort, do a
pairwise compare for identity. If all are identical, the sort did not
do anything.

3. write your own sort that has a boolean function you can ask if it
moved anything.

4. do some sort of checksum before and after.

I find the checksum idea quite intriguing, actually. It would have to do
more operations than some kind of perhaps binary seep through, (although
just as many if the list actually /is/ sorted). But the average greater
efficiency of e.g. #hashCode() vs. #compare() might well offset that,
meponders.

Is there any existing research on this?

Robert Klemme · May 11, 2011

Often you sort things when they are already sorted.

I am interested in simple algorithms to detect whether the sort
actually did anything.

What would the benefit of this be? When you learn the fact the current
sorting run has been done already. And the data tells you nothing about
what future sorts will do. OK, you know, they won't do anything if you
do not modify the collection / array. But you did know that before,
didn't you? And if you insert in sort order then you also know that
there is no additional sort necessary. If you pull the data from some
external source you also do not know whether next time round you need to
sort. So what is this information useful for?

Btw, couldn't option 3 be tricky? Depending on the sorting algorithm
things might be moved around but the final order might be the same as
before. Not that this would be desirable but I am pretty sure there are
sorting algorithms which have this property (probably quicksort). Now,
how then do you interpret your boolean flag? Basically you only know
sequences are identical if there was no movement whatsoever, but if
there was movement then you do not know. Which means you would need to
apply one of the other strategies additionally...

Kind regards

robert

Tom Anderson · May 11, 2011

I am interested in simple algorithms to detect whether the sort actually
did anything.

Write a Comparator that keeps track of how many times it returns >0, and
keep your fingers crossed that the sort algorithm always passes it
parameters in index order? A quick try seems to indicate that this does
work with Arrays.sort, although i don't believe the spec makes it
something you could rely on.

You could do something with a decorator that associates each element with
its index in the list, then use that information to do the order check in
the comparator. I'm not explaining this well, but it's a pretty bad idea,
so i won't.

If you wrote your own implementation, you could simply count the number of
times you swap elements, and if it's nonzero, you know the list was out of
order. This would be a good opportunity to implement a good sorting
algorithm - something that is, unlike mergesort, adaptive and in-place
(although not necessarily stable), perhaps Smoothsort:

http://www.keithschwarz.com/smoothsort/

You could also half-implement your own algorithm; do a linear scan through
your list, checking that each consecutive pair of elements is in order. As
soon as you hit a pair that isn't, stop, ask Collections.sort to sort the
remainder of the list, then go back and merge that half with the sorted
prefix you scanned through. Then return false. If you made it to the end
of the list without finding anything out of order, return true.

tom

Tom Anderson · May 11, 2011

I find the checksum idea quite intriguing, actually. It would have to do
more operations than some kind of perhaps binary seep through,

A binary seep? What's that, then?

(although just as many if the list actually /is/ sorted). But the
average greater efficiency of e.g. #hashCode() vs. #compare() might well
offset that, meponders.

Do you think hashCode is generally more efficient than compare? I would
have thought the opposite, because for hashCode, you have to examine all
an object's fields, whereas for a compareTo, you only have to examine as
many as it takes to find some which are different. Unless the hash is
cached, as in String.

Is there any existing research on this?

The only thing i can think of which is remotely similar is the Rabin-Karp
string searching algorithm, which computes a rolling hash over a string to
search it. And the rsync algorithm, which does something similar. And
those really aren't even remotely similar.

tom

Daniele Futtorovic · May 11, 2011

A binary seep? What's that, then?

Uuhh... well... kinda like a binary sort, except you'd return (const
bool NOT_ORDERED) instead of swapping.

Do you think hashCode is generally more efficient than compare? I would
have thought the opposite, because for hashCode, you have to examine all
an object's fields, whereas for a compareTo, you only have to examine as
many as it takes to find some which are different. Unless the hash is
cached, as in String.

Yes, I thought so because of the caching. Any instance could cache the
hashCode() at least between mutation of its data (theoretically,
assuming it can track mutation), whereas a comparison cannot, because it
compares against something different every time.
In other words, between mutations, each invocation of hashCode() yields
the same result, but each invocation of compare() potentially a
different one (depending on what's compared against).

Joshua Cranmer · May 12, 2011

Btw, couldn't option 3 be tricky? Depending on the sorting algorithm
things might be moved around but the final order might be the same as
before. Not that this would be desirable but I am pretty sure there are
sorting algorithms which have this property (probably quicksort).

The term you are looking for is the stability of the sort. A stable sort
preserves the order of equally-compared object; unstable sorts do not.
Of the major sorts, heapsort and quicksort are the only sorts which are
unstable, although a bad of implementation of any stable sort could be
unstable. Wikipedia claims that quicksort can be implemented stably, but
that comes at an O(n) space price.

Java uses quicksort for primitive types in Arrays.sort and mergesort for
reference types: since you can't distinguish two equal primitive types,
you can't tell that quicksort isn't stable.

Robert Klemme · May 13, 2011

The term you are looking for is the stability of the sort.

No, not exactly. Stability only refers to the _output_ of the
complete sort operation. At least in theory a sort could move things
around even though it is stable.

The question what that bit of information is useful for still remains
unanswered.

Cheers

robert

Roedy Green · May 13, 2011

Do you think hashCode is generally more efficient than compare? I would
have thought the opposite, because for hashCode, you have to examine all
an object's fields, whereas for a compareTo, you only have to examine as
many as it takes to find some which are different. Unless the hash is
cached, as in String.

The object as a whole has a hashCode. that is just a 32 bit
operation. The compare might end up comparing a number of long
strings char by char. HashCodes are cached. The speed comes when it
is already computed.

In some cases you can tolerate a small rate of error. I would think
often mistaking a no-change for a change would not be catastrophic,
though the opposite normally would be. I don't think any
hashCode/checksum method could guarantee 100% accuracy.
--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.

Roedy Green · May 13, 2011

What would the benefit of this be?

A very simple example would be a sort utility that reads data into RAM
and sorts it. If the sort did not change the order, there is no need
to write it back.
--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.

Roedy Green · May 13, 2011

Java uses quicksort for primitive types in Arrays.sort and mergesort for
reference types: since you can't distinguish two equal primitive types,
you can't tell that quicksort isn't stable.

Java's sort on objects is stable IIRC. There you can tell apart
objects that compare equal by their addresses if not other fields in
the object not used in the compare.

In all my years, I had never noticed that stability is irrelevant for
primitives. It is like a pleasant little backscratch to have that
brought to my attention.
--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.

Robert Klemme · May 13, 2011

A very simple example would be a sort utility that reads data into RAM
and sorts it. If the sort did not change the order, there is no need
to write it back.

Sounds plausible - at first. OTOH in such a situation I would probably
rather remember the last modification time and only sort if it has
changed after the last sorted writing. That way you are even more
efficient because you save the effort of read IO as well.

Maybe it's just that I didn't have such a use case but I am still not
really convinced that what you are proposing is so useful.

Kind regards

robert

Spock · May 13, 2011

In some cases you can tolerate a small rate of error. I would think
often mistaking a no-change for a change would not be catastrophic,
though the opposite normally would be. I don't think any
hashCode/checksum method could guarantee 100% accuracy.

Unfortunately, hashCode/checksum methods will never detect change where
there is none, but may sometimes detect no change when there is one,
which is precisely the type of error that would be more serious.

Lawrence D'Oliveiro · May 14, 2011

... a Tim sort (soon to be the standard sort in Java) ...

Gee, I wonder what language that came from ...

Lew · May 14, 2011

Gee, I wonder what language that came from ...

Irrelevant.

Andreas Leitgeb · May 14, 2011

Lew said:
Irrelevant.

I wonder, why Lawrence isn't using Python instead of Java on
the Android...

Lawrence D'Oliveiro · May 15, 2011

Andreas Leitgeb said:
I wonder, why Lawrence isn't using Python instead of Java on
the Android...

Guess what I recently installed and started using
<http://code.google.com/p/android-scripting/>.

Lawrence D'Oliveiro · May 15, 2011

The object as a whole has a hashCode. that is just a 32 bit
operation.

Is 32 bits enough for a hash code? Birthday paradox and all that...

Did the sort do anything?	12	Nov 7, 2011
spindly display failure	0	May 29, 2011
FAQ 4.52 How do I sort an array by (anything)?	0	Feb 23, 2011
How you can save fuel and the environment	0	Jan 10, 2010
How you can save fuel and the environment	0	Oct 18, 2009
How you can save fuel and the environment	0	Jan 10, 2010
How you can save fuel and the environment	0	Oct 18, 2009
comp.lang.c and the BP Oil Spill	7	May 30, 2010

Did the sort do anything?

Roedy Green

markspace

Andreas Leitgeb

Daniele Futtorovic

Robert Klemme

Tom Anderson

Tom Anderson

Daniele Futtorovic

Joshua Cranmer

Robert Klemme

Roedy Green

Roedy Green

Roedy Green

Robert Klemme

Spock

Lawrence D'Oliveiro

Lew

Andreas Leitgeb

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads