SortedSet faster than Collections.sort()

T

Timo Nentwig

Hi!

Is a TreeSet faster than port-sorting an ArrayList? According to some rough
benchmarking I did it is by the factor of Collections.sort() but this
benchmark was not done under reliable circumstances.

Why can't I set an initial size for a TreeSet?
 
J

John C. Bollinger

Timo said:
Hi!

Is a TreeSet faster than port-sorting an ArrayList? According to some rough
benchmarking I did it is by the factor of Collections.sort() but this
benchmark was not done under reliable circumstances.

TreeSet has to maintain its internal order, including keeping its tree
balanced, with every addition. This can be tremendously slower than
putting the same objects in a List and sorting them once, and we have
seen it to be so, given enough elements in the set. On the other hand,
it's already sorted whenever you need it.

If you change your Set reasonably often relative to iterating over it or
using the methods peculiar to SortedSet then a List that you sort at
need might be better. Likewise if you only sometimes need it to be in
order. Also consider that in many situations you don't really need
sorting at all. For example, finding the minimum element can be done in
O(N) by simply examining each element -- if that's all you need, and you
only need it a small number of times (fewer than log(N)) then sorting
the elements is wasteful.
Why can't I set an initial size for a TreeSet?

What would it mean? You can't have the tree without the elements.
 
A

Alan Krueger

John said:
TreeSet has to maintain its internal order, including keeping its tree
balanced, with every addition. This can be tremendously slower than
putting the same objects in a List and sorting them once, and we have
seen it to be so, given enough elements in the set.

Not sure how this can be true. TreeSet Javadoc says, "This
implementation provides guaranteed log(n) time cost for the basic
operations (add, remove and contains)." This implies O(N log N) time to
insert N items. Collections.sort Javadoc says, "This algorithm offers
guaranteed n log(n) performance."
If you change your Set reasonably often relative to iterating over it or
using the methods peculiar to SortedSet then a List that you sort at
need might be better.

Doubtful, since adding an item to a TreeSet would take O(log N) time;
adding to a sorted list would be no faster and the add-and-resort for an
unsorted list would certainly be slower.
 
C

Chris Smith

Alan Krueger said:
Not sure how this can be true. TreeSet Javadoc says, "This
implementation provides guaranteed log(n) time cost for the basic
operations (add, remove and contains)." This implies O(N log N) time to
insert N items. Collections.sort Javadoc says, "This algorithm offers
guaranteed n log(n) performance."

The difference is in the definition of "n". The time to perform some
set of add/remove/contains operations on a TreeSet would actually be on
the order of m * log(n), where 'm' is the total number of operations and
n is the average number of elements in the TreeSet. If all operations
are add operations as in your example, then this turns out to be the
same as n * log(n), but only because we can demonstrate that m == n / 2.

However, if m is not on the order of n, then the fact doesn't hold.
Furthermore, in reality, the number of operations on a long-lived data
structure is rarely on the order of the average size, but is instead
much greater. If the average size of the tree is on the order of the
square root of the number of operations, then the total time of the task
becomes O(n^2 * log(n)), which is looking considerably worse. If the
number of ordered traversals is constant or otherwise doesn't scale with
the size of the problem, then an explicit sort per traversal is still
only O(n * log(n)), and the TreeSet is looking pretty bad by comparison.
Doubtful, since adding an item to a TreeSet would take O(log N) time;
adding to a sorted list would be no faster and the add-and-resort for an
unsorted list would certainly be slower.

Adding to a sorted list is definitely O(log(n)), but adding to an
unsorted list can be constant-time. Sorting at need can then be cheaper
than working with the SortedSet data structure, as explained above.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
E

el goog

Adding to a sorted list is definitely O(log(n)),

What do you mean by this? It's not true if your list is implemented
using an array or linked list, at least if you want to maintain the
list in sorted order.
 
C

Chris Smith

el goog said:
What do you mean by this? It's not true if your list is implemented
using an array or linked list, at least if you want to maintain the
list in sorted order.

You should read that as "O(log(n)) or worse", which doesn't impact the
original point at all.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
E

el goog

You should read that as "O(log(n)) or worse", which doesn't impact
the
original point at all.

OK, I see your intent - you mean Theta(log(n)). You shouldn't use
big-Oh
notation with lower bounds. If you think about the definition, you'll
see
that "O(log(n)) or worse" is a vacuous statement.

I don't disagree with your point, just your abuse of notation. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top