Binary Search

L

Lawrence D'Oliveiro

I was going to implement a binary search in my Canadian Tax program I
am refactoring. I read my own notes and benchmark. Binary search is
useless. Either linear or HashMap or TreeMap is always faster.

Linear search faster than binary??

I wonder if this is a result of CPU caching behaviour...
 
L

Lawrence D'Oliveiro

You could implement the latter constraint (data in descending order of
access) dynamically using a move-to-front list, though you'd want a
data structure that offers cheap insertion at the front.

Or you could simply maintain an access counter attached to each element, and
re-sort after every, say, 10**n lookups (for n ≥ 3 or something).
 
A

Andreas Leitgeb

Lawrence D'Oliveiro said:
Linear search faster than binary??

For smaller number of items not all that surprising.
For larger numbers: Roedy also mentioned Hash-/TreeMaps.

Roedy, would you tell us at which sizes the maps
started to be better than linear search in your test?

Also, whether your searches were always for exact matches, or if
you also had searches for "smallest greater/eq than" or the like.
I wonder if this is a result of CPU caching behaviour...

Sounds likely. I think it was even mentioned already - especially
the slightly more predictable branchings of linear searches (thus
pre-caching of the code-path that's more likely to follow).

Locality of access likely plays a smaller role in Java, unless
we speak exclusively of arrays of primitives.
 
M

Mike Schilling

Lawrence D'Oliveiro said:
Linear search faster than binary??

I wonder if this is a result of CPU caching behaviour...

Linear search will be faster for a sufficiently small list size simply
because the overhead is lower.
 
L

Lawrence D'Oliveiro

The problem is, Map and SortedMap don't "map" well onto binary search.
binary search to work properly requires embedded keys. Maps require
them separate.

Sounds like the Java Map classes are not well designed.
 
W

Wojtek

Leif Roar Moldskred wrote :
For that small sets the difference in performance between a custom-made
"BinarySearchableList" and TreeSet will either be insignificant to the
overall program performance or so critical that you'll want to use
arrays instead of lists anyway.

I had a requirement to store over 3K Strings and to retrieve them in a
random fashion. I used a HashSet. One day I had some free time, so I
converted the HashSet to an array, then compared the results.

The array was a faster lookup. In fact I calculated that if the
application ran for a week I could save every week almost a full
milli-second compared to the HashSet.

YMMV
 
M

Mike Schilling

Lawrence D'Oliveiro said:
Sounds like the Java Map classes are not well designed.

Or that someone doesn't understand them. Embedded keys can be made to work
perfectly well with SortedMaps simply by making both arguments to put() the
same, and providing a comparator that can locate the key in the object.
 
L

Lawrence D'Oliveiro

Leif Roar Moldskred wrote :

I had a requirement to store over 3K Strings and to retrieve them in a
random fashion. I used a HashSet. One day I had some free time, so I
converted the HashSet to an array, then compared the results.

The array was a faster lookup. In fact I calculated that if the
application ran for a week I could save every week almost a full
milli-second compared to the HashSet.

How long would you need to run it to make back the time you spent doing the
analysis?
 
L

Lew

objectivity. Thus it's impossible to refute, because the similarity is
introduced somewhere upstream from the ear. So I believe you, "Lawrence". From
where you listen, it does sound that way.

It occurs to me I am unclear with the term "upstream". In this instance I
imagine ideas as something like salmon struggling to get to the brain from the
ear, so I intended "upstream" to mean in the listener, not the speaker or the
message itself.
 
L

Lew

Leif Roar Moldskred wrote :

I had a requirement to store over 3K Strings and to retrieve them in a random
fashion. I used a HashSet. One day I had some free time, so I converted the
HashSet to an array, then compared the results.

The array was a faster lookup. In fact I calculated that if the application
ran for a week I could save every week almost a full milli-second compared to
the HashSet.

How could you have accepted such an inefficient algorithm?

Did you account for variations in load, garbage collection, anti-virus
activity, Hotspot optimizations and the like? You know that micro-benchmarks
are notoriously unreliable for general conclusions. For all I know, I could
wind up saving a millisecond a week using HashSet compared to array for my
load profiles.

Don't forget to account for JVM load time!

It occurs to me that the HashSet approach might suffer from infelicitous
coding idioms that interfere with GC. Perhaps you're packratting. (See
<http://mindprod.com/jgloss/packratting.html>.) So what you need to do is
make sure you carefully scope alllll your variables in the HashSet scenario,
and thoroughly instrument your code so you can determine where you're
colliding with the GC. (See <http://mindprod.com/jgloss/profiler.html>.)

You will, of course, need to institute a task force to performance-test your
String-storage module, and to stand up a full suite of JMeter-based tests with
detailed reports to management on whether that millisecond savings is sustainable.

You'll also need contingency strategies, like, what if the program has to
suspend? You'll need to serialize the data and deserialize it later to
resume. Your performance team will need to consider the impact of the choice
of HashSet vs. array for that functionality.

You play this right and that String storage subsystem will merit an entire
department and plenty of big iron with you as lead performance architect. Who
knows? You might want up saving two milliseconds!

--
Lew
Raise your hands - who took all this bullshit seriously? Come on now, admit it!
If you did take it seriously, you don't have what it takes to be a programmer.
Get out now.
I mean it. We don't want you.
 
W

Wojtek

Lawrence D'Oliveiro wrote :
How long would you need to run it to make back the time you spent doing the
analysis?

This was a Web app with 24/7 up-time. So not long. It took about
18,000,000 milli-seconds to write, so about 18,000,000 weeks, or just
less than 350 years.

That would be in the year 2360, so the UTC date conversions would still
work!
 
L

Lawrence D'Oliveiro

Or that someone doesn't understand them. Embedded keys can be made to
work perfectly well with SortedMaps simply by making both arguments to
put() the same, and providing a comparator that can locate the key in the
object.

So why isn’t there a single-argument overload of the put method to save you
the trouble?
 
M

Mike Schilling

Lawrence D'Oliveiro said:
So why isn’t there a single-argument overload of the put method to save
you
the trouble?

There is, if you use a Set instead of a Map. Which, for sorting, works
equally well.
 
M

Mike Schilling

Leif Roar Moldskred said:
How would Java know which field or property you intend to use as the key?

For a SortedMap, specify a Comparator that knows.
 
L

Lew

Mike said:
There is, if you use a Set instead of a Map. Which, for sorting, works equally
well.

In the standard API, HashSet<E> is implemented as a Map<E, E> under the hood,
as is TreeSet<E>. "Lawrence" could spend its time profitably by studying the
Javadocs.
 
M

Mike Schilling

Lew said:
In the standard API, HashSet<E> is implemented as a Map<E, E> under the
hood, as is TreeSet<E>. "Lawrence" could spend its time profitably by
studying the Javadocs.

To be fair, it wasn't Lawrence who said that Maps don't work with embedded
keys.
 
T

Tom Anderson

There is, if you use a Set instead of a Map.

Except there's no way to retrieve the value from the Set later! We've been
round this one before - the idea that sets might support some sort of
'get' or 'canonicalize' method to do just that.

Mind you, with an embedded key, i'm not sure how you'd do lookups even
with a map. To retrieve some object, wouldn't you need to have it to hand
in the first place, to be able to pass in its embedded key? Or would you
also support lookup by freestanding key?

tom
 
M

Mike Schilling

Tom Anderson said:
Except there's no way to retrieve the value from the Set later! We've been
round this one before - the idea that sets might support some sort of
'get' or 'canonicalize' method to do just that.

I was thinking that you'd iterate through the sorted set to get a sorted
List or array, but that wasn't the use we were discussing, was it.
Mind you, with an embedded key, i'm not sure how you'd do lookups even
with a map. To retrieve some object, wouldn't you need to have it to hand
in the first place, to be able to pass in its embedded key? Or would you
also support lookup by freestanding key?

You can look it up with an object that's equal to (as opposed to identical
to) the one embedded in the value. But you knew that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top