STL hash_map hash

M

Murali

I have a requirement where I have to use two unsigned ints as a key in a STL
hash map.
A couple of ways to do this is
1. create a struct with two unsigned ints and use that as key (write my own
HashFcn and EqualKey template args) or,
2. convert the two unsigned ints to char*s, concatenate them and use that as
Key.

For method 1, the difficulty I am having is in writing the HashFcn. HashFcn
requires the following method
size_t operator()(const T& x)
I cannot create a unique hash for two unsigned ints for the hash_map that is
a size_t.

To try method 2, I thought of first trying with just one unsigned int
converted to a char* and use that as Key.
The result of it was that the hash key generated happened to be the same for
two different unsigned ints
causing one of them to be dropped.

Please suggest solution that would work to create an STL hash_map with 2
unsigned ints as Key.

Here is the code that does that:

#include <iostream>
#include <ext/hash_map>

using namespace std;
using namespace __gnu_cxx;

struct eqstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) == 0;
}
};

int main()
{
hash_map<const char*, unsigned int, hash<const char*>, eqstr> hash_obj;

hash<const char*> hashfun;

char key[32];

sprintf(key, "%u", (((unsigned int)~0) / 2) + 250000000);
hash_obj[key] = (((unsigned int)~0) / 2) + 250000000;
cout << "Hash for " << key << " = " << hashfun(key) << endl;

sprintf(key, "%u", (((unsigned int)~0) / 2) + 300000000);
hash_obj[key] = (((unsigned int)~0) / 2) + 300000000;
cout << "Hash for " << key << " = " << hashfun(key) << endl;

for(hash_map<const char*, unsigned int, hash<const char*>,
eqstr>::const_iterator i = hash_obj.begin();
i != hash_obj.end(); i++)
{
cout << "hash_obj[" << (*i).first << "] = " << (*i).second << endl;
}
}

/* g++ /usr/include/c++/3.3 -o post post.cc -lstdc++ */

The output was
Hash for 2397483647 = 123096165
Hash for 2447483647 = 123096165
hash_obj[2447483647] = 2447483647
 
D

David Harmon

I cannot create a unique hash for two unsigned ints for the hash_map that is
a size_t.

You fundamentally cannot create a unique hash of size N bits for all
keys of size M > N bits. That is part of what hashing is all about.
You want a hash function such that the mapping of the keys to the hash
values is uncorrelated with and looks random relative to the key values
that you are likely to encounter.

Doesn't hash_map come with a suitable default hashing function? It's
not part of standard C++ yet so I wouldn't know. ;-)

Lacking a default I would probably try using CRC32 over all the bytes of
the key. Google "CRC32 source code"
The result of it was that the hash key generated happened to be the same for
two different unsigned ints causing one of them to be dropped.

Neither should be dropped if EqualKey says they are not equal, even if
the hash is the same.

Don't waste time converting your keys to strings, if you can help it.
 
J

John Harrison

Murali said:
I have a requirement where I have to use two unsigned ints as a key in a STL
hash map.
A couple of ways to do this is
1. create a struct with two unsigned ints and use that as key (write my own
HashFcn and EqualKey template args) or,
2. convert the two unsigned ints to char*s, concatenate them and use that as
Key.

For method 1, the difficulty I am having is in writing the HashFcn. HashFcn
requires the following method
size_t operator()(const T& x)
I cannot create a unique hash for two unsigned ints for the hash_map that is
a size_t.

You don't have to create a unique hash. Hash tables work perfectly well
without unique hash values. Just exclusive-or the two integers together,
that's a prefectly good hash function.
To try method 2, I thought of first trying with just one unsigned int
converted to a char* and use that as Key.
The result of it was that the hash key generated happened to be the same for
two different unsigned ints
causing one of them to be dropped.

No that does not happen when you have duplicate hash values. Duplicate hash
values are called 'collisions'. The skill in writing a hashing algorithm is
in dealing with the collisions, but they do not stop a hash table working.
Please suggest solution that would work to create an STL hash_map with 2
unsigned ints as Key.

Here is the code that does that:

#include <iostream>
#include <ext/hash_map>

using namespace std;
using namespace __gnu_cxx;

struct eqstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) == 0;
}
};

int main()
{
hash_map<const char*, unsigned int, hash<const char*>, eqstr> hash_obj;

hash<const char*> hashfun;

char key[32];

sprintf(key, "%u", (((unsigned int)~0) / 2) + 250000000);
hash_obj[key] = (((unsigned int)~0) / 2) + 250000000;
cout << "Hash for " << key << " = " << hashfun(key) << endl;

sprintf(key, "%u", (((unsigned int)~0) / 2) + 300000000);

Here's your problem, you are using key as your key, but you are overwriting
it when you add the second value.

Try this

char key1[32];
char key2[32];

sprintf(key1, "%u", (((unsigned int)~0) / 2) + 250000000);
hash_obj[key1] = (((unsigned int)~0) / 2) + 250000000;
cout << "Hash for " << key << " = " << hashfun(key1) << endl;
sprintf(key2, "%u", (((unsigned int)~0) / 2) + 300000000);
hash_obj[key2] = (((unsigned int)~0) / 2) + 300000000;
cout << "Hash for " << key << " = " << hashfun(key2) << endl;>

or just go back to using two unsigned int, instead of strings.

Incidentally how dod you think that the STL generates unique hash values for
indefinte length strings and fits them into a size_t? Impossible, no?
hash_obj[key] = (((unsigned int)~0) / 2) + 300000000;
cout << "Hash for " << key << " = " << hashfun(key) << endl;>
for(hash_map<const char*, unsigned int, hash<const char*>,
eqstr>::const_iterator i = hash_obj.begin();
i != hash_obj.end(); i++)
{
cout << "hash_obj[" << (*i).first << "] = " << (*i).second << endl;
}
}

/* g++ /usr/include/c++/3.3 -o post post.cc -lstdc++ */

The output was
Hash for 2397483647 = 123096165
Hash for 2447483647 = 123096165
hash_obj[2447483647] = 2447483647

john
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top