S
Steven D'Aprano
For all practical purposes, dicts have almost constant access time (at
least with any half-decent __hash__ Â method).
Hash tables are slick, but their hash function is their weakest link.
[1, 1, 1, 1, 1, 1, 1, 1][ hash( 2**x ) for x in range( 0, 256, 32 ) ]
That's not the hash function of a dict. Dicts don't have a hash function:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: dict objects are unhashable
What you're showing is the hash function of ints, and you've discovered
that there are some collisions. This is hardly surprising. There are an
infinite number of ints, and only a finite number of values that they can
hash to, so there have to be some collisions. It's worth looking at the
values that collide:
[1, 4294967296L, 18446744073709551616L, 79228162514264337593543950336L,[ 2**x for x in range( 0, 256, 32 ) ]
340282366920938463463374607431768211456L,
1461501637330902918203684832716283019655932542976L,
6277101735386680763835789423207666416102355444464034512896L,
26959946667150639794667015087019630673637144422540572481103610249216L]
So if you have a dict with keys equal to those numbers, and do a lookup
on one of them, you'll get O(N) performance instead of O(1) *for those
keys that collide*. I think I can live with that.
If you think you can come up with a better hash function, feel free to
subclass int.
Such an index gives you linear-time performance in finding elements. Ha!
In any real application, all your keys are highly unlikely to collide in
that fashion.
Plus, your worst-case insertion will cause a copy of the entire table.
Explain please.
There aren't any mappings based on balanced trees, unfortunately.
I'm not sure why you think O(log N) is better than O(1). The major
advantage of tree structures is that, unlike hash tables, they are
ordered, not that they don't have collisions.