[snip]
re: counterargument to storing the hash code.
- If the table stores pointers to items rather than item
instances, storing hash values in the table roughly
doubles the amount of memory required. If the same
amount of memory were used for a pointers-only table,
the number of table slots would double and the load
factor would be halved. This should lead to shorter
searches -- and it is not at all clear _a priori_ that
more fast probes will outrace fewer slow probes.
This doesn't sound right - ordinarily a table has to hold a
key/value pair. Adding a hash would increase the load factor by
50% - for a given amount of memory.
I think you said the same thing I did; perhaps we have a
confusion of terminology. (Not surprising, since "hash table"
itself is not "a" thing, but a wide and varied family of things.)
My point is that the choice between a pointer/hash pair and
a pointer alone is not clear-cut. Storing the pair allows one
to short-circuit the (possibly expensive) comparisons when the
search key's hash and the stored hash disagree. For the same
expenditure of memory, a pointer-only table could have twice as
many slots and a corresponding reduction in collision frequencies,
leading to fewer probes per search.
Hmmm: Maybe our disagreement is about where the key lives.
I've been assuming that the key is part of the item, so the
choice is between pointer/hash and pointer. But maybe you're
assuming the key is a separate entity, so the choice is between
key_pointer/data_pointer and key_pointer/data_pointer/hash. The
argument still holds with a different expansion factor: Storing
the hash now adds 50% to the table size, while leaving it out
allows 33% more slots and a load factor reduced by one-third.