Binary Search

R

Roedy Green

Is there some reason Java has no built-in Collection that keeps a
sorted linear list and searches by binary search? You need something
light weight and fast for small sets. Would such a beast just not fit
into any of the interfaces?
 
E

Eric Sosman

Is there some reason Java has no built-in Collection that keeps a
sorted linear list and searches by binary search? You need something
light weight and fast for small sets. Would such a beast just not fit
into any of the interfaces?

TreeSet has the capabilities you want, I think. I'm not sure
what your threshold for "light weight" is, but if you're interested
in "small sets" binary search offers few benefits.
 
R

Roedy Green

Why not just use a TreeSet?

Because it is a hammer for removing fleas. I'm thinking of sets of
less than 20 elements. For small sets, I think you could get better
performance and certainly better RAM usage with something specialised
for small sets. I will probably invent something, where I can have a
little set attached to each of many objects then I can have a bakeoff
with a TreeSet to see where the break point is.
 
J

Joshua Cranmer

Because it is a hammer for removing fleas. I'm thinking of sets of
less than 20 elements. For small sets, I think you could get better
performance and certainly better RAM usage with something specialised
for small sets. I will probably invent something, where I can have a
little set attached to each of many objects then I can have a bakeoff
with a TreeSet to see where the break point is.

On the order of 20 elements, a simple unsorted array is good enough
performance for anything you want to do, unless that code is really hot,
in which case you probably need to roll your own specific data-structure
for the task at hand.
 
L

Lew

Is there some reason Java has no built-in Collection that keeps a
sorted linear list and searches by binary search? You need something
light weight and fast for small sets. Would such a beast just not fit
into any of the interfaces?

For small sets, binary search provides negligible benefit over any other.
Big-O analysis applies only for sufficiently large n. Below the cutoff,
performance of your treasured algorithm could be much worse than the linear
alternative.

All right, it cannot be much worse, because THE SET IS SMALL!
 
R

Roedy Green

For that small sets the difference in performance between a custom-made
"BinarySearchableList" and TreeSet will either be insignificant to the
overall program performance or so critical that you'll want to use
arrays instead of lists anyway.

It will matter if you have sufficiently large numbers of these sets.
You will spend more RAM on overhead than data.

The problem is, Map and SortedMap don't "map" well onto binary search.
binary search to work properly requires embedded keys. Maps require
them separate.
 
R

Roedy Green

Maintaining a list in sorted order is expensive if the list is stored as
a simple linear list (whether linked or, even worse, array-based).

The pattern I often work with is the elements are known at compile
time or when the program first starts, and from then on are
effectively read-only.

I thought you might replace the entire array on every add, and sort
on the first lookup. This as slow to build, but has a simple sorted
array of the exact size for lookup.
 
M

markspace

The pattern I often work with is the elements are known at compile
time or when the program first starts, and from then on are
effectively read-only.

For data known at compile time, it seems like Arrays.sort() and
Arrays.binarySearch() might be the way to go.
 
E

Eric Sosman

Then we lose encapsulation.

If the searching of these sets is so frequent that the code
deserves this kind of micro-optimization, one of the first micro-
optimizations you should make is to jettison encapsulation and
many other shibboleths, too: You can't afford them.

Contrapositive corollary: If you believe you can afford them,
you should also believe you don't need that much optimization.

Semi-rhetorical (but only semi-) question: Why are you using
Java-with-all-its-overheads instead of assembly code? If your
answer is "I don't need speed *that* badly," my rejoinder is
"The defense rests."
 
M

Mike Schilling

Leif Roar Moldskred said:
For that small sets the difference in performance between a custom-made
"BinarySearchableList" and TreeSet will either be insignificant to the
overall program performance or so critical that you'll want to use
arrays instead of lists anyway.

For less than 20 elements, a brute force "search them all" should be quite
fast enough.
 
L

Lew

Eric said:
If the searching of these sets is so frequent that the code
deserves this kind of micro-optimization, one of the first micro-
optimizations you should make is to jettison encapsulation and
many other shibboleths, too: You can't afford them.

"Shibboleth" a speech pattern, pronunciation or cultural reference that only
an insider or native would say or know correctly, as a way of uncovering true
origins. The old American World War II "Who won the 1943 World Series?"
shibboleth was supposed to reveal Nazi spies, who of course wouldn't know,
unlike a true red-blooded American.

I don't even know who won the 2010 World Series.
 
R

Roedy Green

Is there some reason Java has no built-in Collection that keeps a
sorted linear list and searches by binary search? You need something
light weight and fast for small sets. Would such a beast just not fit
into any of the interfaces?

I was going to implement a binary search in my Canadian Tax program I
am refactoring. I read my own notes and benchmark. Binary search is
useless. Either linear or HashMap or TreeMap is always faster.
 
M

Mike Schilling

Lew said:
"Shibboleth" a speech pattern, pronunciation or cultural reference that
only an insider or native would say or know correctly, as a way of
uncovering true origins. The old American World War II "Who won the 1943
World Series?" shibboleth was supposed to reveal Nazi spies, who of course
wouldn't know, unlike a true red-blooded American.

I don't even know who won the 2010 World Series.

Communist.
 
J

Joshua Cranmer

"Shibboleth" a speech pattern, pronunciation or cultural reference that
only an insider or native would say or know correctly, as a way of
uncovering true origins. The old American World War II "Who won the 1943
World Series?" shibboleth was supposed to reveal Nazi spies, who of
course wouldn't know, unlike a true red-blooded American.

I don't even know who won the 2010 World Series.

It was the Lakers, right? ;-)
 
M

Michael Wojcik

Patricia said:
I would recommend comparing its performance to a linear search with the
data in descending order of historical access frequency.

Agreed. Besides the branch-prediction behavior, binary search can
suffer from locality-of-reference performance issues, though that's
not likely to be a problem with Roedy's small arrays. But in general,
on modern systems, binary search may perform much worse (relative to
linear) than algorithmic complexity suggests.

You could implement the latter constraint (data in descending order of
access) dynamically using a move-to-front list, though you'd want a
data structure that offers cheap insertion at the front. And if you
need multithreaded access to the list that would also complicate
performance. (At first glance I think you can do it locklessly with a
linked list, but I'd want to review that rather more carefully before
using it.)

On retrieval, a MTF list moves the retrieved element to the front of
the list. So over time frequently-accessed items end up clustered near
the front.

(MTF lists are sometimes used in predictive compression schemes;
Burroughs and Wheeler suggest using one to reorganize entropy in front
of the Burroughs-Wheeler Transform, for example, though they actually
turn it into an encoder, where the output isn't the retrieved item but
its index in the list before it's moved.)
 
D

Daniele Futtorovic

"Shibboleth" a speech pattern, pronunciation or cultural reference that
only an insider or native would say or know correctly, as a way of
uncovering true origins. The old American World War II "Who won the 1943
World Series?" shibboleth was supposed to reveal Nazi spies, who of
course wouldn't know, unlike a true red-blooded American.

I don't even know who won the 2010 World Series.

Bayern Munich, right?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top