Universal hashing

D

desktop

Maybe not specific related to C++ but inspired by the recent discussions
about unordered_set in the upcoming C++x0 release.

When using universal hashing a new hash-function from a family of
hash-functions is used to map a key into the table.

Lets assume that hash-function H1 has been used to map key k1 to the
table and H2 to map k2 to the table.

I would now like to make a lookup on k1. But from the point of view of
lookup how does it know which hash-function to use?
 
T

tommy.hinks

Maybe not specific related to C++ but inspired by the recent discussions
about unordered_set in the upcoming C++x0 release.

When using universal hashing a new hash-function from a family of
hash-functions is used to map a key into the table.

Lets assume that hash-function H1 has been used to map key k1 to the
table and H2 to map k2 to the table.

I would now like to make a lookup on k1. But from the point of view of
lookup how does it know which hash-function to use?

Presumably this hash-function would be a template parameter to the
std::unordered_set, and would therefore be the same for all elements
in the set. Using different hash-function when inserting elements
defeats the purpose of hashing. The way that today's std::set<> and
std::map<> work are by specifying a predicate operator for equivalence
testing is very similar to the above argument.

T
 
D

desktop

Presumably this hash-function would be a template parameter to the
std::unordered_set, and would therefore be the same for all elements
in the set. Using different hash-function when inserting elements
defeats the purpose of hashing. The way that today's std::set<> and
std::map<> work are by specifying a predicate operator for equivalence
testing is very similar to the above argument.

T

Universal hashing as I understand is used to avoid that all elements (in
theory possible) hash to the same "bucket". Its a method to distribute
the keys as even as possible.

This means that you don't provide a single hash-function as a template
parameter but a family of hash-functions. But I still don't see how this
can work when you make a lookup.
 
O

outlaw

Lets assume that hash-function H1 has been used to map key k1 to the
table and H2 to map k2 to the table.

I would now like to make a lookup on k1. But from the point of view of
lookup how does it know which hash-function to use?


Read the TR1 specs here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
Specifically page 98 has the unordered_set<> header:


namespace std {
namespace tr1 {
// [6.3.4.3] Class template unordered_set
template <class Value,
class Hash = hash<Value>,
class Pred = std::equal_to<Value>,
class Alloc = std::allocator<Value> >
class unordered_set;

section 6.3.1 describes the hash as

3.
Each unordered associative container is parameterized by Key, by a
function object Hash that acts as a hash function
for values of type Key, and by a binary predicate Pred that induces an
equivalence relation on values of type Key.
Additionally, unordered_map and unordered_multimap associate an
arbitrary mapped type T with the Key.

4.
A hash function is a function object that takes a single argument of
type Key and returns a value of type std::size_t.

So to answer your question, the container knows at compile time which
hash function to use.

(e-mail address removed)
http://securitymario.spaces.live.com/
 
D

desktop

outlaw said:
Lets assume that hash-function H1 has been used to map key k1 to the
table and H2 to map k2 to the table.

I would now like to make a lookup on k1. But from the point of view of
lookup how does it know which hash-function to use?


Read the TR1 specs here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
Specifically page 98 has the unordered_set<> header:


namespace std {
namespace tr1 {
// [6.3.4.3] Class template unordered_set
template <class Value,
class Hash = hash<Value>,
class Pred = std::equal_to<Value>,
class Alloc = std::allocator<Value> >
class unordered_set;

section 6.3.1 describes the hash as

3.
Each unordered associative container is parameterized by Key, by a
function object Hash that acts as a hash function
for values of type Key, and by a binary predicate Pred that induces an
equivalence relation on values of type Key.
Additionally, unordered_map and unordered_multimap associate an
arbitrary mapped type T with the Key.

4.
A hash function is a function object that takes a single argument of
type Key and returns a value of type std::size_t.

So to answer your question, the container knows at compile time which
hash function to use.

(e-mail address removed)
http://securitymario.spaces.live.com/

But thats not very universal. The point with universal hashing is to
randomly select a hash-function each time an operation is made (like
insert). This is described in Introduction to Algorithms in Cormen et. al.
 
O

outlaw

You don't choose a new hash function for each insertion.  Each time you
instantiate a new hash table (e.g. each time the program is run), you
choose a new hash function.

No, but the type has to be known at compile time with C++ generic
programming.
Now, What you do in your datatype at runtime is up to you, if you
wanted to simply "return 1;" from the hash function each time, or
return BCryptGenRandom() each time, so be it.
You'd have a linear time ordered container with the former; and
constant time inserts with the latter, without an easy way of
retrieving it afterwards.

The OP seemed to want to do something akin to
std::tr1::unordered_set<HisObjs, new HisHash<HisObjs> >; which isn't
syntactically correct.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,059
Members
48,769
Latest member
Clifft

Latest Threads

Top