SortedSet faster than Collections.sort()

Discussion in 'Java' started by Timo Nentwig, Feb 24, 2005.

  1. Timo Nentwig

    Timo Nentwig Guest

    Hi!

    Is a TreeSet faster than port-sorting an ArrayList? According to some rough
    benchmarking I did it is by the factor of Collections.sort() but this
    benchmark was not done under reliable circumstances.

    Why can't I set an initial size for a TreeSet?
     
    Timo Nentwig, Feb 24, 2005
    #1
    1. Advertising

  2. Timo Nentwig wrote:

    > Hi!
    >
    > Is a TreeSet faster than port-sorting an ArrayList? According to some rough
    > benchmarking I did it is by the factor of Collections.sort() but this
    > benchmark was not done under reliable circumstances.


    TreeSet has to maintain its internal order, including keeping its tree
    balanced, with every addition. This can be tremendously slower than
    putting the same objects in a List and sorting them once, and we have
    seen it to be so, given enough elements in the set. On the other hand,
    it's already sorted whenever you need it.

    If you change your Set reasonably often relative to iterating over it or
    using the methods peculiar to SortedSet then a List that you sort at
    need might be better. Likewise if you only sometimes need it to be in
    order. Also consider that in many situations you don't really need
    sorting at all. For example, finding the minimum element can be done in
    O(N) by simply examining each element -- if that's all you need, and you
    only need it a small number of times (fewer than log(N)) then sorting
    the elements is wasteful.

    > Why can't I set an initial size for a TreeSet?


    What would it mean? You can't have the tree without the elements.


    --
    John Bollinger
     
    John C. Bollinger, Feb 24, 2005
    #2
    1. Advertising

  3. Timo Nentwig

    Alan Krueger Guest

    John C. Bollinger wrote:
    > TreeSet has to maintain its internal order, including keeping its tree
    > balanced, with every addition. This can be tremendously slower than
    > putting the same objects in a List and sorting them once, and we have
    > seen it to be so, given enough elements in the set.


    Not sure how this can be true. TreeSet Javadoc says, "This
    implementation provides guaranteed log(n) time cost for the basic
    operations (add, remove and contains)." This implies O(N log N) time to
    insert N items. Collections.sort Javadoc says, "This algorithm offers
    guaranteed n log(n) performance."

    > If you change your Set reasonably often relative to iterating over it or
    > using the methods peculiar to SortedSet then a List that you sort at
    > need might be better.


    Doubtful, since adding an item to a TreeSet would take O(log N) time;
    adding to a sorted list would be no faster and the add-and-resort for an
    unsorted list would certainly be slower.
     
    Alan Krueger, Feb 25, 2005
    #3
  4. Timo Nentwig

    Chris Smith Guest

    Alan Krueger <> wrote:
    > John C. Bollinger wrote:
    > > TreeSet has to maintain its internal order, including keeping its tree
    > > balanced, with every addition. This can be tremendously slower than
    > > putting the same objects in a List and sorting them once, and we have
    > > seen it to be so, given enough elements in the set.

    >
    > Not sure how this can be true. TreeSet Javadoc says, "This
    > implementation provides guaranteed log(n) time cost for the basic
    > operations (add, remove and contains)." This implies O(N log N) time to
    > insert N items. Collections.sort Javadoc says, "This algorithm offers
    > guaranteed n log(n) performance."


    The difference is in the definition of "n". The time to perform some
    set of add/remove/contains operations on a TreeSet would actually be on
    the order of m * log(n), where 'm' is the total number of operations and
    n is the average number of elements in the TreeSet. If all operations
    are add operations as in your example, then this turns out to be the
    same as n * log(n), but only because we can demonstrate that m == n / 2.

    However, if m is not on the order of n, then the fact doesn't hold.
    Furthermore, in reality, the number of operations on a long-lived data
    structure is rarely on the order of the average size, but is instead
    much greater. If the average size of the tree is on the order of the
    square root of the number of operations, then the total time of the task
    becomes O(n^2 * log(n)), which is looking considerably worse. If the
    number of ordered traversals is constant or otherwise doesn't scale with
    the size of the problem, then an explicit sort per traversal is still
    only O(n * log(n)), and the TreeSet is looking pretty bad by comparison.

    > Doubtful, since adding an item to a TreeSet would take O(log N) time;
    > adding to a sorted list would be no faster and the add-and-resort for an
    > unsorted list would certainly be slower.


    Adding to a sorted list is definitely O(log(n)), but adding to an
    unsorted list can be constant-time. Sorting at need can then be cheaper
    than working with the SortedSet data structure, as explained above.

    --
    www.designacourse.com
    The Easiest Way To Train Anyone... Anywhere.

    Chris Smith - Lead Software Developer/Technical Trainer
    MindIQ Corporation
     
    Chris Smith, Feb 25, 2005
    #4
  5. Timo Nentwig

    el goog Guest

    > Adding to a sorted list is definitely O(log(n)),

    What do you mean by this? It's not true if your list is implemented
    using an array or linked list, at least if you want to maintain the
    list in sorted order.
     
    el goog, Feb 25, 2005
    #5
  6. Timo Nentwig

    Chris Smith Guest

    el goog <> wrote:
    > > Adding to a sorted list is definitely O(log(n)),

    >
    > What do you mean by this? It's not true if your list is implemented
    > using an array or linked list, at least if you want to maintain the
    > list in sorted order.


    You should read that as "O(log(n)) or worse", which doesn't impact the
    original point at all.

    --
    www.designacourse.com
    The Easiest Way To Train Anyone... Anywhere.

    Chris Smith - Lead Software Developer/Technical Trainer
    MindIQ Corporation
     
    Chris Smith, Feb 25, 2005
    #6
  7. Timo Nentwig

    el goog Guest

    > You should read that as "O(log(n)) or worse", which doesn't impact
    the
    > original point at all.


    OK, I see your intent - you mean Theta(log(n)). You shouldn't use
    big-Oh
    notation with lower bounds. If you think about the definition, you'll
    see
    that "O(log(n)) or worse" is a vacuous statement.

    I don't disagree with your point, just your abuse of notation. :)
     
    el goog, Feb 26, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Doug Poland
    Replies:
    9
    Views:
    752
    VisionSet
    Sep 27, 2003
  2. Sasha
    Replies:
    3
    Views:
    13,245
    Adam Jenkins
    Jan 13, 2004
  3. Larry Coon

    Re-sorting a SortedSet

    Larry Coon, Jun 2, 2004, in forum: Java
    Replies:
    5
    Views:
    1,302
    Chris Smith
    Jun 5, 2004
  4. JerryJ
    Replies:
    11
    Views:
    1,424
    Dave Moore
    Apr 28, 2004
  5. John Carter

    Bug: SortedSet gives warning.

    John Carter, Mar 4, 2005, in forum: Ruby
    Replies:
    2
    Views:
    124
    John Carter
    Mar 4, 2005
Loading...

Share This Page