Java VM 1.6.0 Sorting Collections

T

Tom Anderson

Arrays.sort() will use quicksort for integral types.

For all primitive types, in fact, including doubles. Although i note it
won't sort booleans (for shame!).

I should not have said "Java's standard sort"; i should have said "Java's
standard sort for Lists", or "Collections' standard sort". Since the OP
has a List and doesn't want to use an array, the behaviour of Arrays.sort
is not terribly important - unless the OP wants to get clever and wrap an
array of primitive doubles in a List interface, as has been suggested.

tom
 
R

Roedy Green

Theoretically HotSpot will be a native compiler for this test, as was
pointed out to me upthread.

It is something to try. You can download the evaluation version, and
just type jc xxx.jar to create an exe to do the experiment. Jet also
has some self-tuning heap code, which might be crucial with such a
large test set.
 
R

Roedy Green

Theoretically HotSpot will be a native compiler for this test, as was
pointed out to me upthread.

The problem with a HotSpot and a benchmark is:

1. the first part of your benchmark is run unoptimised.

2. part of your benchmark is chewed up optimising the code to native.
With a native compiler, this is all done before your benchmark starts.
HotSpot cannot afford to pore over the fine tuning of the optimisation
because the clock is running.

To mask those effects you need a very long-running benchmarks, or
perhaps running various lengths of benchmark and doing some math to
factor out the effects. You can also run multiple trials and throw
away the early/slow ones.
 
J

Jim Janney

Patricia Shanahan said:
Jim said:
The evil approach would be to use reflection to get the elementData
field in ArrayList, and call Arrays.sort directly on that.

Before doing that, I suggest testing the performance of sorting an array
of Double references. There are two differences between the List sort
and sorting a double[]:

1. The copying between the List and the array of references that
actually gets sorted.

2. The extra cost of doing a Comparable compare rather than a double
comparison.

The elementData trick would avoid the first of these costs, but not the
second.

And the cost of copying is linear, so for large lists the cost of
sorting will still dominate. Log2 of a million is around twenty, so
avoiding the copying may not buy very much.
 
J

Joshua Cranmer

And the cost of copying is linear, so for large lists the cost of
sorting will still dominate. Log2 of a million is around twenty, so
avoiding the copying may not buy very much.

You have to do two copies in List sort, so moving to arrays eliminates 2
out of 22 linear scans, or about 9% of the total sort runtime. Actually,
at the maximum array size (2^31 - 1), lg is 31, so you'd get about 6%
speedup. Perhaps Java should have implemented Collections.sort in such a
manner that sorting something that implements List and RandomAccess is
sorted in-place.
 
A

Arne Vajhøj

another optimisation is to use a native compiler. See

Questionable.

Today conversion to native is done to make decompilation
harder not to improve performance.

Arne
 
A

Arne Vajhøj

I have been having some trouble with the performance of Java. In my
code I have narrowed it down to the sort:

For instance:


import java.util.*;
public class SortTest
{
public static void main(String args[])
{
final int vecSize = 1000000;
ArrayList<Double> vec = new ArrayList<Double>(vecSize);
final int numItr = 10;
ArrayList<Double> times = new ArrayList<Double>(numItr);
for (int i=0; i<numItr; ++i)
{
for (int k=0; k<vecSize; ++k)
{
vec.add(k,Math.random());
}
long startTime = System.nanoTime();
java.util.Collections.sort(vec);
long endTime = System.nanoTime();
times.add(i,(endTime-startTime)*1.e-9);
vec = new ArrayList<Double>(vecSize);
}
double avg=0.0;
for (Double val:times)
{
avg += val;
}
avg /= times.size();
System.out.println("To sort " + vec.size() + " elements " +
numItr + " times took " + avg + " seconds on average");

}
}



This prints about 0.58 seconds on average. How can I optimize this
reasonably? Note I am only timing the sort function. Using an array
instead of ArrayList is not an option. I tried Vector but it didn't
help. The equivalent C++ code (using vector) runs in about 0.37
seconds when built with optimization (g++ version 4.2.1 using -03
optimization ). I am running on a MacBook Pro 2.66 GHz Intel Core i7
with 4 GB of ram. I also tried this example on Linux and saw no
difference. What is causing over 40% difference in speed? I must be
doing something wrong in my java code.


#include<iostream>
#include<vector>
#include<cstdlib>
#include<algorithm>
#include<sys/time.h>
#include<numeric>
using namespace std;

template<class T>
void FillRand(T start, T end)
{
while (start != end)
{
*start++ = double(rand())/RAND_MAX;
}
}

class TimeStamp
{
public:
void start()
{
gettimeofday(&st_val,0);
}
void stop()
{
gettimeofday(&end_val,0);
}
double elapsedSec() const
{
return end_val.tv_sec-st_val.tv_sec +
(end_val.tv_usec-st_val.tv_usec)*1.e-6;
}
private:
timeval st_val;
timeval end_val;
};

int main()
{
vector<double> vec(1000000);
const long num_itr = 10;
vector<double> time_stamps(num_itr);
TimeStamp ts;
for (long i=0; i<num_itr; ++i)
{
FillRand(vec.begin(),vec.end());
ts.start();
std::sort(vec.begin(),vec.end());
ts.stop();
time_stamps = ts.elapsedSec();
}
double avg=std::accumulate(time_stamps.begin(),time_stamps.end(),
0.0)/time_stamps.size();
cout<< "To sort "<< vec.size()<< " elements took "<< avg<< "
seconds on average\n";
}


It does not sound surprising to me.

If we are using C terminology, then:
- in Java you have an array of pointers to double
- in C++ you have an array of double

It sounds very plausible that this will cause a performance
difference.

Use double[] in Java.

Or live with the performance.

Arne
 
C

ClassCastException

For all primitive types, in fact, including doubles. Although i note it
won't sort booleans (for shame!).

I can't, offhand, think of any use for that that doesn't boil down to
simply making it slightly faster to count how many trues and falses there
are. An actual count, on the other hand, is O(n) instead of the n log n
typical of sorts. And of course if the list of trues and falses changes
often, it should be even more efficient to just wrap it in something
whose add, remove, etc. methods adjust tally numbers and then punt to the
delegate's add, remove, or whatever. (There'd also have to be query
methods for the tallies.)

Of course none of the above applies to sorting more complex objects on a
boolean *attribute*. E.g. sorting a group of Person objects by age() and
then stable-sorting by sex(), where the latter is two-valued and thus
might be implemented as a boolean. The above applies only to sorting a
simple boolean[] or List<Boolean>.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top