Arrays.sort() slow?

V

vaste

Why does java use to quicksort to sort byte arrays? Wouldn't it be
much faster to just count the elements? (Almost count-sort.)
Similarily for shorts, although not always obviously faster (requires
65k*4).

E.g. one could do:
public static void sort(byte[] a) {
int[] reg = new int[256];
for(byte b : a)
reg[b+128]++;
int i = 0;
for(int k = 0; k < 256; k++)
for(int j = 0; j < reg[k]; j++)
a[i++] = (byte)(k-128);
}

I tried some, and it is indeed faster. Then again, I suppose it's
quite rare that one need to "sort" byte arrays without sattelite data.
 
L

Lew

Why does java [sic] use to quicksort to sort byte arrays? Wouldn't it be
much faster to just count the elements? (Almost count-sort.)
Similarily for shorts, although not always obviously faster (requires
65k*4).

E.g. one could do:
public static void sort(byte[] a) {
int[] reg = new int[256];
for(byte b : a)
reg[b+128]++;
int i = 0;
for(int k = 0; k < 256; k++)
for(int j = 0; j < reg[k]; j++)
a[i++] = (byte)(k-128);
}

I tried some, and it is indeed faster. Then again, I suppose it's
quite rare that one need to "sort" byte arrays without sattelite data.

Sweet algorithm.

Sorts with deep knowledge of their target type can take advantage of that
special knowledge to outstrip a general sort's performance. Some general
sorts perform acceptably for most input if the pre-sort order is random, but
perform horridly if the input is already ordered. If the likelihood of
ordered input is low, this will not cause trouble. If most or all input is
ordered, then a sort that "knows" about ordered input will be more efficient.

Sorts, searches and merges are huge topics to this day.

As to why the writers of one or another API choose a particular algorithm, in
this case possibly to achieve acceptable performance in the time domain
without too much impact in the memory domain while keeping things simple in
the development and maintenance domains. Just a WAG; I wasn't there.
 
D

Daniel Pitts

Why does java use to quicksort to sort byte arrays? Wouldn't it be
much faster to just count the elements? (Almost count-sort.)
Similarily for shorts, although not always obviously faster (requires
65k*4).

E.g. one could do:
public static void sort(byte[] a) {
int[] reg = new int[256];
for(byte b : a)
reg[b+128]++;
int i = 0;
for(int k = 0; k < 256; k++)
for(int j = 0; j < reg[k]; j++)
a[i++] = (byte)(k-128);
}

I tried some, and it is indeed faster. Then again, I suppose it's
quite rare that one need to "sort" byte arrays without sattelite data.

Is that faster in this case:

sort(new byte[]{255, 52, 1});

At what point does it become faster? How about with shorts? Should we
use int[65536] for shorts? If you have a huge list of bytes that you
need to sort, feel free to use the implementation you provided.
Otherwise, don't worry so much about it.
 
C

Christian

Why does java use to quicksort to sort byte arrays? Wouldn't it be
much faster to just count the elements? (Almost count-sort.)
Similarily for shorts, although not always obviously faster (requires
65k*4).

E.g. one could do:
public static void sort(byte[] a) {
int[] reg = new int[256];
for(byte b : a)
reg[b+128]++;
int i = 0;
for(int k = 0; k < 256; k++)
for(int j = 0; j < reg[k]; j++)
a[i++] = (byte)(k-128);
}

I tried some, and it is indeed faster. Then again, I suppose it's
quite rare that one need to "sort" byte arrays without sattelite data.

If your keys have numerical data a bin-sort is also possible ... and may
perform well with large arrays.
 
R

Roedy Green

Wouldn't it be
much faster to just count the elements? (Almost count-sort.)
Similarily for shorts, although not always obviously faster (requires
65k*4).

I refer to that as as the degenerate case of a radix sort. See
http://mindprod.com/jgloss/radixsort.html

The problem with using it in general is you must provide an additional
method beyond those normally used for a Comparator or Comparable. I
discovered only a few situations where radixSort beats Sun's sort in
general.

It might make sense to use a radixsort for arrays of primitives,
perhaps doing byte, short, char in one pass, and int in two passes and
long in four.

Disadvantages:

1. it takes more ram for the aux counters.

2. it takes full time even when the data are already sorted or almost
sorted. Very often only one element is out of position.

Do some benchmarks, and if there is a substantial improvement, put in
an RFE. See http://mindprod.com/jgloss/rfe.html

I discovered radixsort back in 1968. It was not until the 1990s and
the Internet did I learn it was already known. Many times I explained
it and Hashtables to all kinds of programmers. Back then, those two
algorithms they were not widely known.
 
V

vaste

I refer to that as as the degenerate case of a radix sort. Seehttp://mindprod.com/jgloss/radixsort.html

The problem with using it in general is you must provide an additional
method beyond those normally used for a Comparator or Comparable. I
discovered only a few situations where radixSort beats Sun's sort in
general.

It might make sense to use a radixsort for arrays of primitives,
perhaps doing byte, short, char in one pass, and int in two passes and
long in four.

Disadvantages:

1. it takes more ram for the aux counters.

2. it takes full time even when the data are already sorted or almost
sorted. Very often only one element is out of position.

Do some benchmarks, and if there is a substantial improvement, put in
an RFE. Seehttp://mindprod.com/jgloss/rfe.html

I discovered radixsort back in 1968. It was not until the 1990s and
the Internet did I learn it was already known. Many times I explained
it and Hashtables to all kinds of programmers. Back then, those two
algorithms they were not widely known.

What I described is not stable, so it wouldn't work with satellite
data. I.e. the java api has a method that takes an array of bytes
(without reference to any Object) and sorts them. For this they seem
to use quicksort, even for large arrays. This is obviously slower than
the quite trivial algorithm I gave (even if the list is already
sorted).

I wondered if there was any reason for this. I expected the methods in
the standard library to be somewhat optimized...

For sorting general Objects "linearly" one would indeed need a new
interface like comparable etc, and then e.g. radix sort might be a
good choice, but now we're talking about primitives. (And for ints or
longs, radix sort might also be a good choice.) Then again, radix sort
and count sort require twice the memory sorted, which sounds a bit
scary.

/Johan Fänge
 
R

Roedy Green

I wondered if there was any reason for this. I expected the methods in
the standard library to be somewhat optimized...

I think they did the object sort, then thought of the others as mere
convenience methods to the same algorithm. Sorting arrays of bytes
rare. So likely nobody thought of optimising it. I repeat.

Present your case with two benchmarks, with code. Point out the lack
of the usual downsides with a radixsort, and post an RFE. They usually
do ones that are easy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top