which hash table - chained or open-addressed?

J

Julia

I checked the wikipedia link and it helps a lot. In my case, the key
value is relatively continuous but with a few empty slots and a few
outside-range values, for example, 3, 102, 103, 104, 106, 107, 108,
109, 110, 112, 113, 114, 115, 116, ....120, 13000. However, for the
different calling to hash table, the starting value can be very
different, for example, 102 at one calling, and 20333556 at another
calling. I guess open-addressing hash table should be OK for my
application. Am I right?

Best regards,

Julia
 
J

Julia

I checked the wikipedia link and it helps a lot. In my case, the key
value is relatively continuous but with a few empty slots and a few
outside-range values, for example, 3, 102, 103, 104, 106, 107, 108,
109, 110, 112, 113, 114, 115, 116, ....120, 13000. However, for the
different calling to hash table, the starting value can be very
different, for example, 102 at one calling, and 20333556 at another
calling. I guess open-addressing hash table should be OK for my
application. Am I right?

Best regards,

Julia
 
B

Ben Pfaff

CBFalconer said:
You are a blithering idiot, besides an unmannered boor. You have
no idea whom you are talking to, not that it matters. After
something like 50 years of building machines, and getting maximum
performance out of them, I think I have some idea of the
fundamentals and their relative importance. You, on the other
hand, do not appear to.

On what processor do division and logical AND have similar
performance? The _IA-32 Intel Architecture Optimization
Reference Manual_ I have here shows division having a minimum
latency of 56 cycles and a maximum throughput of 1 per 23 cycles.
The corresponding figures for logical AND shows 0.5 cycles
latency and throughput.
 
J

Julia

I've measured the performance of both, and under ideal conditions for
both, they are essentially identical, because the main overhead is
identical in both (the hash and the compare functions). Chained hash
tables are generally much easier to code. With "open address" hash
tables, implementation requires a very complicated and carefully
designed hashing system (dealing with deletes, and making a correct
chaining mechanism, for example). Open address hash tables have the
advantage that you can embed the contents of a struct in each entry
while with chained hashing you have to have either an additional pair
of "next, prev" pointers in your struct or else otherwise provide an
additional container for each entry.

Thanks for your reply. It explains a lot. In my case, the key
value is relatively continuous but with a few empty slots and a few
outside-range values, for example, 3, 102, 103, 104, 106, 107, 108,
109, 110, 112, 113, 114, 115, 116, ....120, 13000. However, for the
different calling to hash table, the starting value can be very
different, for example, 102 at one calling, and 20333556 at another
calling. And I do not need deletes in my case. I guess open-addressing
hash
table should be OK for my application.
There are a lot of classic mistakes made during implementation of hash
tables that can reduce their performance. 1) If you use a prime-sized
table (as some books recommend) you will pay a horrendous penalty for
the % operator on every access; always use a power of 2 sized table
(and an odd, or in fact, prime numbered skip value for the open address
hash table.) 2) If you fail to store the full hash function value
along with the entry, you will be forced to call the native comparison
function for each associated entry during a search -- by pre-screening
by the original hash value, if your hash function is of high quality,
you will reduce the cost by almost never calling the comparison
function *except* on exact matches (thus also improving whatever the
branch prediction performance is in your comparison function).

From your suggestion, I plan to choose the table size as a power of 2,
skip value
as an odd number, like a power of 2 minus 1. Is that a good choice or
you have
other suggestions?

Thanks a lot,
 
J

Julia

William said:
I wrote out a long post, then deleted it. This article should have all or
most of the answers you desire:

http://en.wikipedia.org/wiki/Hash_table#Chaining
http://en.wikipedia.org/wiki/Hash_table#Open_addressing

Although maybe it doesn't clearly state the most important, and I would
hope obvious, point: it depends.

Thanks for your reply. I checked your wikipedia link
and it explains a lot. Obviously, application depandent.
In my case, the key value is relatively continuous but with
a few empty slots and a few outside-range values, for example,
3, 102, 103, 104, 106, 107, 108, 109, 110, 112, 113, 114,
115, 116, ....120, 13000. However, for the different calling
to hash table, the starting value can be very different,
for example, 102 at one calling, and 20333556 at another
calling. And I do not need deletes in my case. I guess
open-addressing hash table should be OK for my application.
Am I right,

Julia
 
C

CBFalconer

Ben said:
.... snip ...

On what processor do division and logical AND have similar
performance? The _IA-32 Intel Architecture Optimization
Reference Manual_ I have here shows division having a minimum
latency of 56 cycles and a maximum throughput of 1 per 23 cycles.
The corresponding figures for logical AND shows 0.5 cycles
latency and throughput.

It only matters if it makes an appreciable difference in the
overall performance. In most cases it doesn't (including the hash
technique). If it is at the heart of inner loop, the answer may
well be different. Otherwise clarity and accuracy are much more
important.

The time to worry about speed is when the delays become a nuisance.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 
L

Lane Straatman

On what processor do division and logical AND have similar
performance? The _IA-32 Intel Architecture Optimization
Reference Manual_ I have here shows division having a minimum
latency of 56 cycles and a maximum throughput of 1 per 23 cycles.
The corresponding figures for logical AND shows 0.5 cycles
latency and throughput.
Can you think of a smallish, compilable example in C that would show this on
a given machine? LS
 
D

Default User

Spiros said:
She didn't top-post !

I suggest you look at the message again. She posted about five, the one
I replied to (the first I saw) WAS top-posted





Brian
 
B

Ben Pfaff

CBFalconer said:
It only matters if it makes an appreciable difference in the
overall performance. In most cases it doesn't (including the hash
technique). If it is at the heart of inner loop, the answer may
well be different. Otherwise clarity and accuracy are much more
important.

That's true of a program but when designing a general-purpose
library it's best to consider performance up front. I don't
understand why you'd go to the trouble of designing a hash
library without considering performance. That's the point of a
hash table, to go fast. Otherwise you can use a simpler data
structure (an array) or one with guaranteed performance (a
balanced tree).
 
W

websnarf

/*
Lane said:
Can you think of a smallish, compilable example in C that would show this on
a given machine? LS
*/

#include <stdio.h>
#include <time.h>

int x, y; /* Extern to keep compiler from optimizing it out. */

int main () {
int i, cc;
clock_t t0, t1, t2;

x = y = 1;

for (;;) {
for (cc = 1000; ; cc+= cc) {
t0 = clock ();
for (i=0; i < cc; i++) {
x %= y;
x %= y;
x %= y;
x %= y;
}
t1 = clock();
for (i=0; i < cc; i++) {
x &= y;
x &= y;
x &= y;
x &= y;
}
t2 = clock();

if (t1 - t0 > 1000 && t2 > t1) {
printf ("Performance ratio: %f\n", (double)(t1 - t0) /
(double)(t2 - t1));
return 0;
}
}
}
return 0;
}

/*
 
B

Ben Pfaff

#include <stdio.h>
#include <time.h>

int x, y; /* Extern to keep compiler from optimizing it out. */

int main () {
int i, cc;
clock_t t0, t1, t2;

x = y = 1;

for (;;) {
for (cc = 1000; ; cc+= cc) {
t0 = clock ();
for (i=0; i < cc; i++) {
x %= y;
x %= y;
x %= y;
x %= y;
}
t1 = clock();
for (i=0; i < cc; i++) {
x &= y;
x &= y;
x &= y;
x &= y;
}
t2 = clock();

if (t1 - t0 > 1000 && t2 > t1) {

I changed the arbitrary value of "1000" here to CLOCKS_PER_SEC,
because CLOCKS_PER_SEC is 1000000 here but it only changes value
100 or 1000 times per second (not sure which).
printf ("Performance ratio: %f\n", (double)(t1 - t0) /
(double)(t2 - t1));
return 0;
}
}
}
return 0;
}

On my machine here, compiled with full optimization, this most
often prints 58 as the ratio.
 
W

websnarf

Julia said:
Thanks for your reply. It explains a lot. In my case, the key
value is relatively continuous but with a few empty slots and a few
outside-range values, for example, 3, 102, 103, 104, 106, 107, 108,
109, 110, 112, 113, 114, 115, 116, ....120, 13000. However, for the
different calling to hash table, the starting value can be very
different, for example, 102 at one calling, and 20333556 at another
calling. And I do not need deletes in my case. I guess open-addressing
hash
table should be OK for my application.

It is ok, but its still harder to code. You have to deal with the
table overflowing, for example.
skip value
as an odd number, like a power of 2 minus 1. Is that a good choice or
you have
other suggestions?

A constant skip value can lead to a really bad domino effect of
interfering chains in the worst case. You usually want to choose a
different skip value on either a per entry basis (using some hashed
transformation of your input data), or merely per-slot bases (after
hashing, or masking in your case.) The simplest way to pick the skip
value is to take the bits of your input that were masked off, and use
it as an index into a list of skip values. While just the odd numbers
will typically work fairly well (and has the advantage of not needing a
table lookup), you can do slightly better by picking from a medium
sized list of small primes (maybe 256 primes, for example.)

So if you are masking b bits for the initial hash then the hash index
is "key & ((1 << b) - 1)" and the skip value can be obtained by
"(short) primeTable[(key >> b) & 255]". In this way, the key has a
deterministic hash index and skip value, but the chances that a given
other key has the same key and skip is 1 in 2**(b+8), thus minimizing
the sort of domino effect. If you want to prototype something that
works before going off and building a prime table, you can always just
use odd numbers for the skip value: "(key >> (b-1)) | 1"
 
U

user923005

Ben said:
I changed the arbitrary value of "1000" here to CLOCKS_PER_SEC,
because CLOCKS_PER_SEC is 1000000 here but it only changes value
100 or 1000 times per second (not sure which).


On my machine here, compiled with full optimization, this most
often prints 58 as the ratio.

Dividing by 4 gives 14.5 (why, after all, are we performing 4 mod and 4
&'s per cycle?)
The difference in speed between a modulus verses & operations in a hash
table will be less than 1% for most reasonable usages. The time spent
computing the hash will be a lot more than the simple operation to find
out which slot it goes into.

If your hash table contains trees or skiplists in the nodes, then it
has the guaranteed performance of a btree in the worst case and the
O(1) performance of a hash table in the best case. It makes me wonder
why all hash tables are not constructed in that way, since it is as
easy as falling off of a log.
 
U

user923005

P.S.
This thread is somewhat overly contentious.

Both C. B. Falconer and Paul Hsieh are intelligent and useful
contributors with a long history of helpful posts to all the USENET
groups to which they contribute.

Neither is stupid or lacking knowlege in the area of hash table
creation. I would rank both of them as expert.

If there is to be an argument about the best way to construct a hash
table, I suggest a contest to see which general purpose hash table
performs best in real world situations (short strings, strings of
arbitrary length, integers, long long integers and doubles with a large
class of disparate distributions given as inputs).

If the contest is to be held in then only ANSI C can
be used, but C90 or C99 should both be given equal footing.
 
B

Ben Pfaff

user923005 said:
Dividing by 4 gives 14.5 (why, after all, are we performing 4 mod and 4
&'s per cycle?)

Why did you divide by 4? (4*a)/(4*b) = a/b, not 4*(a/b).

I assumed that it was done four times per iteration to minimize
loop overhead.
If your hash table contains trees or skiplists in the nodes, then it
has the guaranteed performance of a btree in the worst case and the
O(1) performance of a hash table in the best case. It makes me wonder
why all hash tables are not constructed in that way, since it is as
easy as falling off of a log.

What bothers me with sticking a tree or a skiplist in each hash
node is the memory expense. It's going to cost at least an extra
pointer field or two per hash node. When I'm hashing small data
items that seems unduly expensive. For big data, sure, it makes
sense.
 
S

Spiros Bousbouras

Default said:
I suggest you look at the message again. She posted about five, the one
I replied to (the first I saw) WAS top-posted

At this point in time I see 2 posts by Julia not
counting the opening one and neither contains
top-posting.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top