Benchmarks

Phil Carmody · Nov 7, 2008

Juha Nieminen said:
You would have hard time proving that.

He shouldn't. It should be pretty clear. No hash table needs
to examine each element more than once.

Phil

Juha Nieminen · Nov 7, 2008

Nick said:
CAREFUL ALGORITH SELECTION CAN CREAM MICRO-OPTIMISATION

That's so true. However, we shouldn't completely dismiss
"micro-optimization" and "hacker-optimization" once we have come
up with the fastest possible overall algorithm. Even though two
implementations might both by eg. O(n lg n) with a rather small
constant factor, one of them might still be 10 times faster than
the other if it performs clever low-level tricks inside the
innermost loop (which might be run eg. millions of times).

Of course when dealing with hacker optimization, there's always
a point where it's too detrimental to the readability and
maintainability of the code in relation to the speed benefit it
gives. If the code becomes significantly more obfuscated while
giving a speed boost of 1%, it might not be worth the trouble.
Personally I prefer to write 100 lines of clear, readable and
maintainable code, even if it's 1% slower than 1000 lines of code
worthy of the IOCCC.

Phil Carmody · Nov 7, 2008

Phil Carmody said:
He shouldn't. It should be pretty clear. No hash table needs
to examine each element more than once.

Oooops, suffering from Falconeritis - but you did over-snip rather.
The whole task is indeed o(n^2), but omega(n).

Phil

Juha Nieminen · Nov 7, 2008

user923005 said:
Cuckoo hashing has guaranteed O(1) lookup and delete and amortized
O(1) insert.

Wikipedia says: "Lookup requires inspection of just two locations in
the hash table, which takes constant time in the worst case".

I don't understand how that is possible. Suppose an element is
inserted into the table. Then later a new element displaces this first
element to a second location. Then later another new element, to be put
in this second location, displaces this first element to a third
location. How exactly do you find this first element anymore? It's not
in either of the two alternative locations defined by the two hashing
functions for that element.

James Kanze · Nov 7, 2008

[ ... ]

Just as a note, the common claim that hash table are O(1)
per access whereas search trees are O(log n) per access is
misleading, and is arrived at by an apples and oranges
comparison. The number of probes in a hash table is O(1)
whereas the number of probes in a search tree is O(log n).
However the cost of computing the hash code for independent
keys is O(m) where m is the average key length, which is
necessarily greater than log2(n). In comparison based trees
the cost of the comparison is O(m), i.e., the probe cost is
O(m). In radix based trees the probe cost is O(1). If O(m)
= O(n) the execution costs per probe (ignoring access
issues) are:

Click to expand...

Click to expand...

Keeping in mind, however, that m and n will usually be quite a
bit difference, so any similarity between O(m) and O(n) is
purely coincidental. In reality, the performance of hash
tables tends to be rather different in general than the
performance of tree-based systems (radix or comparison).

Click to expand...

Funny that no one has mentionned it, but the performance of a
hash table is very, very dependent on the quality of the hashing
function. With a perfect hashing function, the access time is
O(1), but with a poor hashing function, it can be significantly
worse---even to the point of O(n). Unless you're sure that you
can define a good hashing function, you're probably better off
with a balanced tree. (The radix tree is an interesting
suggestion, often ignored. Given the sparseness of the entries,
it may result in a significant lack of locality and a too many
cache misses. But given typical word length, this isn't
certain, and it has other advantages, like not requiring the
word to be actually assembled in memory.)

Note, however, that there are factors that don't show up in a
big-O comparison that can still be quite substantial,
especially for collectiosn of reasonble sizes. For one obvious
one, consider the common case (like this one) where the keys
are strings. Hashing the string requires looking at the whole
string, but comparing two strings will often look at _far_
fewer -- for the first few probes, it'll typically only
involve looking at one or two characters.

Click to expand...

From a theoretical viewpoint, this effect is negligible -- but
given the number of words in a typical vocabulary, it can be
quite substantial.

Click to expand...

I've done some actual measures (some time back), see
http://kanze.james.neuf.fr/code/Docs/html/Hashcode.html. In
this case, I only measured access, not insertion, but for the
list of words in the ispell French dictionary, I found that a
hash table with a good hashing function (FNV or my Mersenne
prime algorithm) was better than three times faster; using a
list of actual program symbols, the difference was still better
than 2.5 (the French word list contained 94883 different words,
the program symbols 15047).

I've since added some additional hashing algorithms, and tested
on differernt machines (which does make a difference), but I've
not gotten the results written up. I think I'll add an
implementation using a radix tree, just for good measure.

James Kanze · Nov 7, 2008

That's so true. However, we shouldn't completely dismiss
"micro-optimization" and "hacker-optimization" once we have
come up with the fastest possible overall algorithm.

Yes, but this should only be undertaken when necessary, under
the control of a profiler---what optimizes on one platform may
slow down on another. And don't forget that a good compiler
will do many of these optimizations for you. A long time ago, I
tried replacing "h = 127 * h + *p" (in a hashing algorithm) with
"h = (h << 7 - h) + *p", knowing that my hardware didn't have
hardware multiply. The results were slower than the original;
the compiler also converted the multiplication into a shift and
subtract, and since it knew the final purpose in those
operations, was actually able to save one machine instruction
over an explicit shift and subtract.

Even though two implementations might both by eg. O(n lg n)
with a rather small constant factor, one of them might still
be 10 times faster than the other if it performs clever
low-level tricks inside the innermost loop (which might be run
eg. millions of times).

I've never seen a factor of 10 here. Even a factor of 2 is
exceptional (and usually indicative of a very poor optimizer in
the compiler).

James Dow Allen · Nov 7, 2008

Just as a note, the common claim that hash table are O(1) per
access whereas search trees are O(log n) per access is
misleading, ...computing the hash code ...
is necessarily greater
than log2(n)....
hash table - O(log n)

Note that, for sufficiently pedantic O(.)
estimates, many of the usual O(.) measures are wrong.
A single arithmetic op may be considered to require
O(log N) time since log N bits are needed to represent
an index on a set of size N.

Returning to OP's problem, I wonder: are Compact
Hash Trees are in wide use? If there's interest
I'll post my implementation.

James Dow Allen

Juha Nieminen · Nov 7, 2008

James said:
I've never seen a factor of 10 here. Even a factor of 2 is
exceptional (and usually indicative of a very poor optimizer in
the compiler).

There's just so much a compiler can do. There are cases where only the
programmer can understand what's going on (rather than the compiler) and
be able to optimize something.

For example, comparing two strings for inequality is O(1) if the
maximum size of the string is fixed (eg. you know that no string has
more than 30 characters), but still much slower than comparing two
integers. If you can somehow change the string comparison to an integer
comparison, and these comparisons are performed millions of times in the
innermost loop of your code, the program may speed up considerably. This
is a kind of optimization which is completely out or the realm of
compiler optimizations.

Jerry Coffin · Nov 7, 2008

[ ... ]

I've never seen a factor of 10 here. Even a factor of 2 is
exceptional (and usually indicative of a very poor optimizer in
the compiler).

I've seen a factor of 10, and even a _little_ bit more -- but it was
_extremely_ hardware specific, depending almost entirely on the cache
organization of the processor. As the data was originally placed in
memory, two crucial items ended up contending for the same cache lines.
Rearranging the data allowed those items to live in different cache
lines.

In this case, the CPU involved wasn't even made when the compiler was
written, so I have a hard time blaming the compiler for the poor
optimization. Unfortunately, I don't know of any newer compilers that
would do much better (and when they do, it's mostly by accident...)

Jerry Coffin · Nov 7, 2008

Juha said:
Juha said:

[...]
Personally I prefer to write 100 lines of clear, readable and
maintainable code, even if it's 1% slower than 1000 lines of code
worthy of the IOCCC.

Click to expand...

It's always easier to make a bug-free program fast than to
bug-fix a fast program.

Oft repeated, but not really true. Sometimes a fast program still
contains a trivial bug, but I can remember one program in particular
that was bug-free as far as I know, but a horrible mess, to the point
that by _far_ the easiest way to improve it was write a set of
specifications for what it did, and start over. By the time I realized
that, however, I'd spent more time trying to optimize one small part
than it eventually took to rewrite the whole thing...

Phil Carmody · Nov 7, 2008

Juha Nieminen said:
Wikipedia says: "Lookup requires inspection of just two locations in
the hash table, which takes constant time in the worst case".

I don't understand how that is possible. Suppose an element is
inserted into the table. Then later a new element displaces this first
element to a second location. Then later another new element, to be put
in this second location, displaces this first element to a third
location. How exactly do you find this first element anymore? It's not
in either of the two alternative locations defined by the two hashing
functions for that element.

The insert can be quite a complicated procedure. It might be
amortized O(1), but worst-case, it's O(epic fail).

Phil

Jerry Coffin · Nov 7, 2008

[ ... hash table operations ]

The insert can be quite a complicated procedure. It might be
amortized O(1), but worst-case, it's O(epic fail).

That depends on how the hash table is designed. It's entirely possible
(for one example) to create a hash table that uses balanced trees for
collision resolution. This gives a worst case of O(log N), which isn't
really all that awful at all.

user923005 · Nov 7, 2008

[ ... hash table operations ]

The insert can be quite a complicated procedure. It might be
amortized O(1), but worst-case, it's O(epic fail).

Click to expand...

That depends on how the hash table is designed. It's entirely possible
(for one example) to create a hash table that uses balanced trees for
collision resolution. This gives a worst case of O(log N), which isn't
really all that awful at all.

I like this scheme:
Say that you want a hash table of 2^20th elements...
Create a hash table of 2^20th elements,
Create a shadow hash table of 2^16th elements
Create a shadow hash table of 2^12th elements
Create a shadow hash table of 2^8th elements, each of which is a
skiplist holding the data type.

You form your 32 bit hash using some nice algorithm like Bob Jenkin's
hash or Goulburn hash.
You mask off the lower 20 bits and this is the address for the large
table.
If there is a collision, you mask off 4 more bits and use the second
largest table.
If there is a collision, you mask off 4 more bits and use the third
largest table.
If there is a collision you take the low byte of the hash and insert
into that skiplist.

If you get a lot more data than you expect, you construct a new larger
table of 2^24th elements and put it in the front as the next new
layer.

Of course, if you want, you can rehash the data each time, but I think
it is better to simply mask. That way, if you suddenly start growing
your skiplists and the first level tables are lightly populated, then
you can detect what is obviously a denial of service attack.

Phil Carmody · Nov 7, 2008

Jerry Coffin said:
[ ... hash table operations ]

uhuh - specifically cuckoo hash operations.

That depends on how the hash table is designed. It's entirely possible

[... to not use a cuckoo hash ...]

Yes, but then it wouldn't be the hash design whose Big-Ohs were
being debated.

Phil

Juha Nieminen · Nov 7, 2008

Richard said:
It doesn't work like that, though you had me perplexed for a
moment. The first element goes back to its original location,
displacing the second. Thus, suppose that h0, h1, etc are
locations and the first three element are e0, e1, e2, and that
their address pairs are:

e0: h0,h1
e1: h0,h2
e2: h1,h3

Initially e0 is inserted in location h0. When e1 is inserted, e0
is moved to h1, and e1 is inserted in h0. The contents of the
table look like this:

h0: e1
h1: e0
h2: -
h3: -

Now e2 is inserted; it goes into location h1, displacing e0. In
turn e0 goes back to its alternate location h0, displacing e1,
which has to go to its alternate location, h2. The table now
looks like this:

h0: e0
h1: e2
h2: e1
h3: -

I'm not 100% sure I fully understand it, but anyways, it sound to me
like insertion is O(n) like that (each displaced element could
potentially displace an existing element, which could potentially
displace an existing element, etc, potentially going through O(n)
elements without it going into an infinite loop).

I don't even see how insertion is amortized constant-time. It sounds
to me like insertion is not only O(n), but also could potentially be
amortized linear-time, especially if poor hashing functions are chosen.

There was no conditional ("with properly designed hashing functions")
in the description given by wikipedia for the amortized constant time
insertion, but it made it sound like it's *always* amortized constant
time, regardless of the chosen hashing functions... I can't understand
how it's possible.

Heck, in fact I think it's rather trivial to demonstrate that
insertion can be O(infinite), by simply using these two hashing functions:

f1(x) = 0
f2(x) = 1

With these hashing functions it's impossible to insert more than two
elements in the hash table. Trying to insert a third one is impossible.

Of course these two functions are absolutely extreme, but I think they
demonstrate that the hash table can *not* have constant-time insertion
(or even linear-time, for that matter) for all possible hashing
functions. At most this may be possible for *some* hashing functions.

CBFalconer · Nov 7, 2008

Phil said:
Jerry Coffin said:

(e-mail address removed) says...

[ ... hash table operations ]

Click to expand...

uhuh - specifically cuckoo hash operations.

That depends on how the hash table is designed. It's entirely
possible ...

Click to expand...

[... to not use a cuckoo hash ...]

Yes, but then it wouldn't be the hash design whose Big-Ohs were
being debated.

I gather this sub-thread applies specifically to the cuckoo hash,
with which I am not familiar. However note that the design of
hashlib specifically prevents the hashtable being more than about
88% full, and normally is in the 44 to 88% range. This
specifically prevents the occurence of long trails with any
reasonable hash function. The organization does specifically
requires table reorganization at intervals. The result is O(1)
operation. The reorganization is considerably cheaper than the
initial construction of that table, because it is not necessary to
check equality in the rehashing operation (we know that the
existing table has no duplicates). At any rate, see:

<http://cbfalconer.home.att.net/download/hashlib.zip>

user923005 · Nov 7, 2008

I'm not 100% sure I fully understand it, but anyways, it sound to me
like insertion is O(n) like that (each displaced element could
potentially displace an existing element, which could potentially
displace an existing element, etc, potentially going through O(n)
elements without it going into an infinite loop).

I don't even see how insertion is amortized constant-time. It sounds
to me like insertion is not only O(n), but also could potentially be
amortized linear-time, especially if poor hashing functions are chosen.

There was no conditional ("with properly designed hashing functions")
in the description given by wikipedia for the amortized constant time
insertion, but it made it sound like it's *always* amortized constant
time, regardless of the chosen hashing functions... I can't understand
how it's possible.

Heck, in fact I think it's rather trivial to demonstrate that
insertion can be O(infinite), by simply using these two hashing functions:

f1(x) = 0
f2(x) = 1

With these hashing functions it's impossible to insert more than two
elements in the hash table. Trying to insert a third one is impossible.

Of course these two functions are absolutely extreme, but I think they
demonstrate that the hash table can *not* have constant-time insertion
(or even linear-time, for that matter) for all possible hashing
functions. At most this may be possible for *some* hashing functions.

Sure. If hash(x) == return 1; then bad behavior is to be expected.
The assumption is of maximally cascade free hash functions like UMAC
or Bob Jenkin's hash.

There are also little tweaky things done to make the "real-life"
behavior of Cuckoo hash better (there is an article with a title
something like "Hash with a Stash" that explains it in detail.)

Big-O analysis is an examination of average case behavior. If your
hash table is too small and you do not make provision to enlarge it,
then it can suffer from attack by an opponent, and so we should also
carefully examine worst-case behavior.

For example, suppose you have a small hash table with 2000 entries
that overflows into link-lists.
Someone tests one trillion strings and finds three million strings
that map to 4 distinct hash codes (because they have somehow learned
your hash algorithm).
The opponent sends these three million strings to your hash table.
Obviously, we are going to get some long chains, and we will have to
linear search them to discover if an item is in the table or not.
These attacks are realistic and do happen in real-life (it turns out
that the world has a surprisingly large population of whinging twits).

Now, if we had made the table a lot larger (e.g. 64000 entries) it
would be a lot harder to find strings that map to the same value.
And if we had made the spill strategy more robust (e.g. a skiplist
instead of an ordinary linked list) then the behavior never goes
catastrophic.

It's too bad that we have to plan for whinging twits when designing
data structures, but bad or skewed sequences do happen from time to
time in real life also, so it is not entirely wasted effort.

Juha Nieminen · Nov 8, 2008

user923005 said:
Big-O analysis is an examination of average case behavior.

Not true. Big-O denotes the asymptotic worst case behavior.

One can say "the algorithm is O(n lg n) *in average*", but AFAIK
that's a rather shaky definition. The big-O notation always denotes the
upper bound of the asymptotic behavior of the algorithm. In other words,
any asymptotic behavior of the algorithm with any given data will always
fall below the big-O function curve. If with some data the algorithm
behaves asymptotically slower than the given function, then that's not
the true big-O function for that algorithm.

For example quicksort is O(n^2). (Don't believe what people say. It
*is* O(n^2), period.)

Kai-Uwe Bux · Nov 9, 2008

Juha said:
Not true. Big-O denotes the asymptotic worst case behavior.

One can say "the algorithm is O(n lg n) *in average*", but AFAIK
that's a rather shaky definition. The big-O notation always denotes the
upper bound of the asymptotic behavior of the algorithm. In other words,
any asymptotic behavior of the algorithm with any given data will always
fall below the big-O function curve. If with some data the algorithm
behaves asymptotically slower than the given function, then that's not
the true big-O function for that algorithm.

For example quicksort is O(n^2). (Don't believe what people say. It
*is* O(n^2), period.)

Big-O notation is oblivious to whether it is used to say something about the
worst case or about the average case. It also does not care whether you say
something about time or space complexity. In fact, Big-O is a purely
mathematical concept: Given two functions f and g, we say that f is a
O(g) if there is a constant C such that f(x) <= C g(x) for all x.

For computer science, f is often the worst case runtime of an algorithm
depending on some input length n (in some measure). So, we say that this
algorithm has worst case time complexity O(n^2) if the worst case run time
is bounded from above by C n^2 for some C. However, f could be the
average runtime or it could be the worst case space complexity, or some
other function. Thus, you can use Big-O notation also to say something
about worst case space complexity, average case time complexity, or other
complexities. The Big-O won't care.

Best

Kai-Uwe Bux

CBFalconer · Nov 9, 2008

Juha said:
Not true. Big-O denotes the asymptotic worst case behavior.

One can say "the algorithm is O(n lg n) *in average*", but AFAIK
that's a rather shaky definition. The big-O notation always
denotes the upper bound of the asymptotic behavior of the
algorithm. In other words, any asymptotic behavior of the
algorithm with any given data will always fall below the big-O
function curve. If with some data the algorithm behaves
asymptotically slower than the given function, then that's not
the true big-O function for that algorithm.

For example quicksort is O(n^2). (Don't believe what people say.
It *is* O(n^2), period.)

I suspect you know what you are talking about, but you are leaving
the wrong impression. The sort of sequence that produces O(n*n)
quicksort performance is extremely rare, and very slight controls
make it even rarer (such as selecting the median of 3 items as the
value to partition about).

The result is that for most cases quicksort will be the fastest
sort. Right behind it are other O(nLOGn) methods. I prefer
mergesort and data input in lists, because then I don't have to
worry about the size to be sorted. In addition, mergesort is
stable (quicksort is not).

Infinite loop problem	1	Nov 4, 2023
C language. work with text	3	Dec 10, 2021
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
Fibonacci	0	May 13, 2023
String and list error while running a Markov Chain	1	Aug 26, 2020
Single put routine overlapping words during iteration	4	Jan 2, 2023
Help in hangman game	1	Jul 24, 2023
Queue in C	25	May 19, 2014

Benchmarks

Phil Carmody

Juha Nieminen

Phil Carmody

Juha Nieminen

James Kanze

James Kanze

James Dow Allen

Juha Nieminen

Jerry Coffin

Jerry Coffin

Phil Carmody

Jerry Coffin

user923005

Phil Carmody

Juha Nieminen

CBFalconer

user923005

Juha Nieminen

Kai-Uwe Bux

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads