Binary Search

Discussion in 'Java' started by Roedy Green, Mar 24, 2011.

  1. Roedy Green

    Roedy Green Guest

    Is there some reason Java has no built-in Collection that keeps a
    sorted linear list and searches by binary search? You need something
    light weight and fast for small sets. Would such a beast just not fit
    into any of the interfaces?

    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    If you think it’s expensive to hire a professional to do the job, wait until you hire an amateur.
    ~ Red Adair
    Roedy Green, Mar 24, 2011
    #1
    1. Advertising

  2. Roedy Green

    Eric Sosman Guest

    On 3/24/2011 5:08 AM, Roedy Green wrote:
    > Is there some reason Java has no built-in Collection that keeps a
    > sorted linear list and searches by binary search? You need something
    > light weight and fast for small sets. Would such a beast just not fit
    > into any of the interfaces?


    TreeSet has the capabilities you want, I think. I'm not sure
    what your threshold for "light weight" is, but if you're interested
    in "small sets" binary search offers few benefits.

    --
    Eric Sosman
    d
    Eric Sosman, Mar 24, 2011
    #2
    1. Advertising

  3. Roedy Green

    Roedy Green Guest

    On Thu, 24 Mar 2011 04:38:06 -0500, Leif Roar Moldskred
    <> wrote, quoted or indirectly quoted someone who
    said :

    >Why not just use a TreeSet?


    Because it is a hammer for removing fleas. I'm thinking of sets of
    less than 20 elements. For small sets, I think you could get better
    performance and certainly better RAM usage with something specialised
    for small sets. I will probably invent something, where I can have a
    little set attached to each of many objects then I can have a bakeoff
    with a TreeSet to see where the break point is.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    If you think it’s expensive to hire a professional to do the job, wait until you hire an amateur.
    ~ Red Adair
    Roedy Green, Mar 24, 2011
    #3
  4. On 03/24/2011 10:38 AM, Roedy Green wrote:
    > On Thu, 24 Mar 2011 04:38:06 -0500, Leif Roar Moldskred
    > <> wrote, quoted or indirectly quoted someone who
    > said :
    >
    >> Why not just use a TreeSet?

    >
    > Because it is a hammer for removing fleas. I'm thinking of sets of
    > less than 20 elements. For small sets, I think you could get better
    > performance and certainly better RAM usage with something specialised
    > for small sets. I will probably invent something, where I can have a
    > little set attached to each of many objects then I can have a bakeoff
    > with a TreeSet to see where the break point is.


    On the order of 20 elements, a simple unsorted array is good enough
    performance for anything you want to do, unless that code is really hot,
    in which case you probably need to roll your own specific data-structure
    for the task at hand.

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer, Mar 24, 2011
    #4
  5. Roedy Green

    Lew Guest

    On 03/24/2011 05:08 AM, Roedy Green wrote:
    > Is there some reason Java has no built-in Collection that keeps a
    > sorted linear list and searches by binary search? You need something
    > light weight and fast for small sets. Would such a beast just not fit
    > into any of the interfaces?


    For small sets, binary search provides negligible benefit over any other.
    Big-O analysis applies only for sufficiently large n. Below the cutoff,
    performance of your treasured algorithm could be much worse than the linear
    alternative.

    All right, it cannot be much worse, because THE SET IS SMALL!

    --
    Lew
    Honi soit qui mal y pense.
    http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
    Lew, Mar 24, 2011
    #5
  6. Roedy Green

    Roedy Green Guest

    On Thu, 24 Mar 2011 09:58:40 -0500, Leif Roar Moldskred
    <> wrote, quoted or indirectly quoted someone who
    said :

    >For that small sets the difference in performance between a custom-made
    >"BinarySearchableList" and TreeSet will either be insignificant to the
    >overall program performance or so critical that you'll want to use
    >arrays instead of lists anyway.


    It will matter if you have sufficiently large numbers of these sets.
    You will spend more RAM on overhead than data.

    The problem is, Map and SortedMap don't "map" well onto binary search.
    binary search to work properly requires embedded keys. Maps require
    them separate.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    If you think it’s expensive to hire a professional to do the job, wait until you hire an amateur.
    ~ Red Adair
    Roedy Green, Mar 24, 2011
    #6
  7. Roedy Green

    Roedy Green Guest

    On Thu, 24 Mar 2011 08:02:05 -0700, Peter Duniho
    <> wrote, quoted or indirectly quoted
    someone who said :

    >Maintaining a list in sorted order is expensive if the list is stored as
    >a simple linear list (whether linked or, even worse, array-based).


    The pattern I often work with is the elements are known at compile
    time or when the program first starts, and from then on are
    effectively read-only.

    I thought you might replace the entire array on every add, and sort
    on the first lookup. This as slow to build, but has a simple sorted
    array of the exact size for lookup.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    If you think it’s expensive to hire a professional to do the job, wait until you hire an amateur.
    ~ Red Adair
    Roedy Green, Mar 24, 2011
    #7
  8. Roedy Green

    markspace Guest

    On 3/24/2011 4:44 PM, Roedy Green wrote:

    >
    > The pattern I often work with is the elements are known at compile
    > time or when the program first starts, and from then on are
    > effectively read-only.


    For data known at compile time, it seems like Arrays.sort() and
    Arrays.binarySearch() might be the way to go.
    markspace, Mar 25, 2011
    #8
  9. Roedy Green

    Roedy Green Guest

    On Thu, 24 Mar 2011 18:59:03 -0500, Leif Roar Moldskred
    <> wrote, quoted or indirectly quoted someone who
    said :

    >In which case you'll want to use arrays instead, anyway.


    Then we lose encapsulation.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    If you think it’s expensive to hire a professional to do the job, wait until you hire an amateur.
    ~ Red Adair
    Roedy Green, Mar 25, 2011
    #9
  10. Roedy Green

    Eric Sosman Guest

    On 3/24/2011 8:37 PM, Roedy Green wrote:
    > On Thu, 24 Mar 2011 18:59:03 -0500, Leif Roar Moldskred
    > <> wrote, quoted or indirectly quoted someone who
    > said :
    >
    >> In which case you'll want to use arrays instead, anyway.

    >
    > Then we lose encapsulation.


    If the searching of these sets is so frequent that the code
    deserves this kind of micro-optimization, one of the first micro-
    optimizations you should make is to jettison encapsulation and
    many other shibboleths, too: You can't afford them.

    Contrapositive corollary: If you believe you can afford them,
    you should also believe you don't need that much optimization.

    Semi-rhetorical (but only semi-) question: Why are you using
    Java-with-all-its-overheads instead of assembly code? If your
    answer is "I don't need speed *that* badly," my rejoinder is
    "The defense rests."

    --
    Eric Sosman
    d
    Eric Sosman, Mar 25, 2011
    #10
  11. "Leif Roar Moldskred" <> wrote in message
    news:...
    > Roedy Green <> wrote:
    >>
    >> Because it is a hammer for removing fleas. I'm thinking of sets of
    >> less than 20 elements. For small sets, I think you could get better
    >> performance and certainly better RAM usage with something specialised
    >> for small sets.

    >
    > For that small sets the difference in performance between a custom-made
    > "BinarySearchableList" and TreeSet will either be insignificant to the
    > overall program performance or so critical that you'll want to use
    > arrays instead of lists anyway.


    For less than 20 elements, a brute force "search them all" should be quite
    fast enough.
    Mike Schilling, Mar 25, 2011
    #11
  12. Roedy Green

    Lew Guest

    Shibboleth (Was: Binary Search)

    Eric Sosman wrote:
    > If the searching of these sets is so frequent that the code
    > deserves this kind of micro-optimization, one of the first micro-
    > optimizations you should make is to jettison encapsulation and
    > many other shibboleths, too: You can't afford them.


    "Shibboleth" a speech pattern, pronunciation or cultural reference that only
    an insider or native would say or know correctly, as a way of uncovering true
    origins. The old American World War II "Who won the 1943 World Series?"
    shibboleth was supposed to reveal Nazi spies, who of course wouldn't know,
    unlike a true red-blooded American.

    I don't even know who won the 2010 World Series.

    --
    Lew
    Honi soit qui mal y pense.
    http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
    Lew, Mar 25, 2011
    #12
  13. Roedy Green

    Roedy Green Guest

    On Thu, 24 Mar 2011 02:08:07 -0700, Roedy Green
    <> wrote, quoted or indirectly quoted
    someone who said :

    >Is there some reason Java has no built-in Collection that keeps a
    >sorted linear list and searches by binary search? You need something
    >light weight and fast for small sets. Would such a beast just not fit
    >into any of the interfaces?


    I was going to implement a binary search in my Canadian Tax program I
    am refactoring. I read my own notes and benchmark. Binary search is
    useless. Either linear or HashMap or TreeMap is always faster.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    If you think it’s expensive to hire a professional to do the job, wait until you hire an amateur.
    ~ Red Adair
    Roedy Green, Mar 25, 2011
    #13
  14. Re: Shibboleth (Was: Binary Search)

    "Lew" <> wrote in message
    news:imh4ft$jt2$...
    > Eric Sosman wrote:
    >> If the searching of these sets is so frequent that the code
    >> deserves this kind of micro-optimization, one of the first micro-
    >> optimizations you should make is to jettison encapsulation and
    >> many other shibboleths, too: You can't afford them.

    >
    > "Shibboleth" a speech pattern, pronunciation or cultural reference that
    > only an insider or native would say or know correctly, as a way of
    > uncovering true origins. The old American World War II "Who won the 1943
    > World Series?" shibboleth was supposed to reveal Nazi spies, who of course
    > wouldn't know, unlike a true red-blooded American.
    >
    > I don't even know who won the 2010 World Series.


    Communist.
    Mike Schilling, Mar 25, 2011
    #14
  15. Re: Shibboleth (Was: Binary Search)

    On 03/25/2011 12:08 AM, Lew wrote:
    > Eric Sosman wrote:
    >> If the searching of these sets is so frequent that the code
    >> deserves this kind of micro-optimization, one of the first micro-
    >> optimizations you should make is to jettison encapsulation and
    >> many other shibboleths, too: You can't afford them.

    >
    > "Shibboleth" a speech pattern, pronunciation or cultural reference that
    > only an insider or native would say or know correctly, as a way of
    > uncovering true origins. The old American World War II "Who won the 1943
    > World Series?" shibboleth was supposed to reveal Nazi spies, who of
    > course wouldn't know, unlike a true red-blooded American.
    >
    > I don't even know who won the 2010 World Series.


    It was the Lakers, right? ;-)

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer, Mar 25, 2011
    #15
  16. Re: Shibboleth (Was: Binary Search)

    Mike Schilling wrote:
    > "Lew" <> wrote in message
    > news:imh4ft$jt2$...
    >>
    >> I don't even know who won the 2010 World Series.

    >
    > Communist.


    Huh. I didn't even know they made it to the playoffs.
    Michael Wojcik, Mar 25, 2011
    #16
  17. Patricia Shanahan wrote:
    > On 3/24/2011 5:19 PM, markspace wrote:
    >> On 3/24/2011 4:44 PM, Roedy Green wrote:
    >>>
    >>> The pattern I often work with is the elements are known at compile
    >>> time or when the program first starts, and from then on are
    >>> effectively read-only.

    >>
    >> For data known at compile time, it seems like Arrays.sort() and
    >> Arrays.binarySearch() might be the way to go.

    >
    > I would recommend comparing its performance to a linear search with the
    > data in descending order of historical access frequency.


    Agreed. Besides the branch-prediction behavior, binary search can
    suffer from locality-of-reference performance issues, though that's
    not likely to be a problem with Roedy's small arrays. But in general,
    on modern systems, binary search may perform much worse (relative to
    linear) than algorithmic complexity suggests.

    You could implement the latter constraint (data in descending order of
    access) dynamically using a move-to-front list, though you'd want a
    data structure that offers cheap insertion at the front. And if you
    need multithreaded access to the list that would also complicate
    performance. (At first glance I think you can do it locklessly with a
    linked list, but I'd want to review that rather more carefully before
    using it.)

    On retrieval, a MTF list moves the retrieved element to the front of
    the list. So over time frequently-accessed items end up clustered near
    the front.

    (MTF lists are sometimes used in predictive compression schemes;
    Burroughs and Wheeler suggest using one to reorganize entropy in front
    of the Burroughs-Wheeler Transform, for example, though they actually
    turn it into an encoder, where the output isn't the retrieved item but
    its index in the list before it's moved.)

    --
    Michael Wojcik
    Micro Focus
    Rhetoric & Writing, Michigan State University
    Michael Wojcik, Mar 25, 2011
    #17
  18. Re: Shibboleth (Was: Binary Search)

    On 25/03/2011 05:08, Lew allegedly wrote:
    > Eric Sosman wrote:
    >> If the searching of these sets is so frequent that the code
    >> deserves this kind of micro-optimization, one of the first micro-
    >> optimizations you should make is to jettison encapsulation and
    >> many other shibboleths, too: You can't afford them.

    >
    > "Shibboleth" a speech pattern, pronunciation or cultural reference that
    > only an insider or native would say or know correctly, as a way of
    > uncovering true origins. The old American World War II "Who won the 1943
    > World Series?" shibboleth was supposed to reveal Nazi spies, who of
    > course wouldn't know, unlike a true red-blooded American.
    >
    > I don't even know who won the 2010 World Series.
    >


    Bayern Munich, right?
    Daniele Futtorovic, Mar 25, 2011
    #18
  19. Roedy Green

    Lew Guest

    Re: Shibboleth (Was: Binary Search)

    Daniele Futtorovic wrote:
    > Lew allegedly wrote:
    >> Eric Sosman wrote:
    >>> If the searching of these sets is so frequent that the code
    >>> deserves this kind of micro-optimization, one of the first micro-
    >>> optimizations you should make is to jettison encapsulation and
    >>> many other shibboleths, too: You can't afford them.


    >> "Shibboleth" a speech pattern, pronunciation or cultural reference that
    >> only an insider or native would say or know correctly, as a way of
    >> uncovering true origins. The old American World War II "Who won the 1943
    >> World Series?" shibboleth was supposed to reveal Nazi spies, who of
    >> course wouldn't know, unlike a true red-blooded American.
    >>
    >> I don't even know who won the 2010 World Series.


    > Bayern Munich, right?


    Dude! Their men won the European Club Cup in 1992, but I don't see that they
    have any honors for 2010.
    http://en.wikipedia.org/wiki/European_Chess_Club_Cup

    Oh, wait, did you mean their football team?

    --
    Lew
    It was the Giants, ok? The Giants.
    http://en.wikipedia.org/wiki/2010_World_Series
    Lew, Mar 25, 2011
    #19
  20. Re: Shibboleth (Was: Binary Search)

    "Michael Wojcik" <> wrote in message
    news:...
    > Mike Schilling wrote:
    >> "Lew" <> wrote in message
    >> news:imh4ft$jt2$...
    >>>
    >>> I don't even know who won the 2010 World Series.

    >>
    >> Communist.

    >
    > Huh. I didn't even know they made it to the playoffs.


    Actually, the Reds did make it last year.
    Mike Schilling, Mar 26, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andy
    Replies:
    1
    Views:
    355
    Jack Klein
    Nov 25, 2003
  2. Timmy
    Replies:
    5
    Views:
    465
  3. Replies:
    9
    Views:
    452
    Keith Thompson
    Jul 3, 2009
  4. sanborne
    Replies:
    5
    Views:
    2,007
  5. Bogdan

    Binary tree search vs Binary search

    Bogdan, Oct 18, 2010, in forum: C Programming
    Replies:
    22
    Views:
    3,065
    Michael Angelo Ravera
    Oct 21, 2010
Loading...

Share This Page