Hash Map slower than Map.

A

Amit Bhatia

Hi,
I am trying to write a hash function that works on a pair of integers.

so I have a pair<int,int> of integers. Let the first element be called
"a" and second as "b".

I know before hand that:

0<=a<30
-2^a<=b<=2^a-1.

In more detail, I am using this to keep a record of lattice points seen
on a multi resolution hierarchical grid. A resolution level "l" has
2^{l*d} elements where d is the dimension of the space.

Also, there can be at most two different instantiations of the same
region in the grid for my work. To distinguish them, I add +1 and put
-ve sign in front of one copy.
So <a,b>, and <a,-(b+1)> represent the same region in the grid.
As a result, there can be a key <4,3> and another key <4,-3>


If all the b's were >0, then I could already think of a very simple
perfect hash function:


size_t value = 1<<a+b;

However, if b<0 is also possible, then, I can only think of

size_t value;
if(b>=0) value = 1<<a+b;
else value = 2^30 + 1<<a+-(b+1); // I know a<30 for sure.

Now I used this in my code, and did some profiling and I found that I am
spending almost 80 % of the time just looking up if a key exists in this
hash map. (there are no other major calculations in the program function
where this "find" operation is called).

So for example, in one particular execution of the program, the function
which calls this key lookup is called

158905316 times and takes 100 sec (~80%) of the time.

My question is: can I do much faster than this? I am not an expert, and
I am using g++ 4.2 on a pentium 4 2.4 Ghz with 512 MB RAM. But still
this should be much faster.

What is really surprising is that if I use maps instead, they are
running faster.

So for about, 83127320 calls, I take 16.06 sec (~47.26%) if I use a stl map.

thanks,
-amit.
 
G

Gianni Mariani

Amit said:
size_t value;
if(b>=0) value = 1<<a+b;
else value = 2^30 + 1<<a+-(b+1); // I know a<30 for sure.

This would yield very poor keys.

Why not print out the values from your hash function. Your problem is
probably there.
 
G

Guest

Hi,
I am trying to write a hash function that works on a pair of integers.

so I have a pair<int,int> of integers. Let the first element be called
"a" and second as "b".

Do you want to create a map from a to b, or from (a,b) to c, for some
other c. Or put another way, is a the key or is it a and b?
I know before hand that:

0<=a<30
-2^a<=b<=2^a-1.
If all the b's were >0, then I could already think of a very simple
perfect hash function:


size_t value = 1<<a+b;

For all values of a > 4 that could result in 0 since 2^5 = 32.
However, if b<0 is also possible, then, I can only think of

size_t value;
if(b>=0) value = 1<<a+b;
else value = 2^30 + 1<<a+-(b+1); // I know a<30 for sure.

I do not know what makes a good hash-value, but a fast operation that
will create something that is probably quite unique for each pair is to
simply xor them together. One op with no conditionals.
Now I used this in my code, and did some profiling and I found that I am
spending almost 80 % of the time just looking up if a key exists in this
hash map. (there are no other major calculations in the program function
where this "find" operation is called).

So for example, in one particular execution of the program, the function
which calls this key lookup is called

158905316 times and takes 100 sec (~80%) of the time.

My question is: can I do much faster than this? I am not an expert, and
I am using g++ 4.2 on a pentium 4 2.4 Ghz with 512 MB RAM. But still
this should be much faster.

What is really surprising is that if I use maps instead, they are
running faster.

So for about, 83127320 calls, I take 16.06 sec (~47.26%) if I use a stl map.

If you create a map from a to b then std::map will probably always be
faster than, or at least competitive with, a hash map since the number
of mapped elements is so small. To make it even faster use a fixed size
array (std::vector).
 
A

Amit Bhatia

Erik said:
Do you want to create a map from a to b, or from (a,b) to c, for some
other c. Or put another way, is a the key or is it a and b?

My key is the pair (a,b) which maps to some other object c.
For all values of a > 4 that could result in 0 since 2^5 = 32.

From what I understand size_t is unsigned long (?), so it should have
atleast 32 bits available?
I do not know what makes a good hash-value, but a fast operation that
will create something that is probably quite unique for each pair is to
simply xor them together. One op with no conditionals.
O.k. I am not clear though how it will ensure uniques mapping of the
pair most of the time. I will think more about it.
If you create a map from a to b then std::map will probably always be
faster than, or at least competitive with, a hash map since the number
of mapped elements is so small. To make it even faster use a fixed size
array (std::vector).

Vectors are not a good idea, since I do not know apriori how many
objects I will end up creating to be put into the hash_map. It can be
true that the number of elements in hash_map is small. So for 5
dimensional grid, I end up with something like ~15000 elements inside
the hash_map in the end. But still potentially in the worst case (which
is very unlikely though), I could have to put every single region from
the grid into the hash_map, which means for this particular case at
resolution level l : 2^{5l} objects.

thanks,
amit.
 
B

Ben Rudiak-Gould

Amit said:
If all the b's were >0, then I could already think of a very simple
perfect hash function:

size_t value = 1<<a+b;

+ has higher precedence than << in C++, so the expression above means
1<<(a+b), which is probably not what you intended.

-- Ben
 
A

Amit Bhatia

Ben said:
+ has higher precedence than << in C++, so the expression above means
1<<(a+b), which is probably not what you intended.

-- Ben

Thanks a lot. It works a lot better now. :) For the same example, for
about 83805820 calls it takes about 12.85 seconds. There are two
distinct hash maps, and each has about ~1800 objects.

I am wondering is there a better function that is even faster than what
I am doing? When b<0, since I add 2^30, it kind of takes the value to
very huge numbers. But I am not sure what can be used. Most of the
examples I have seen online talk about mapping strings to keys.

thanks,
amit.
 
J

James Kanze

Thanks a lot. It works a lot better now. :) For the same example, for
about 83805820 calls it takes about 12.85 seconds. There are two
distinct hash maps, and each has about ~1800 objects.
I am wondering is there a better function that is even faster than what
I am doing? When b<0, since I add 2^30, it kind of takes the value to
very huge numbers.

I'm not sure about what you're doing, but if you actually have
written 2^32, the value is the constant 34 in C++. Which may
not be what you wanted.
But I am not sure what can be used. Most of the examples I
have seen online talk about mapping strings to keys.

Most hashing functions (even for strings) actually hash
sequences of unsigned integral values. In your place, I'd just
convert the two values to unsigned, then adopt one of the
classical hash codes, say by multiplying the first by a prime
number, then adding the second (e.g.: "127 * (unsigned)( a ) +
(unsigned)( b )").

Most hash tables I know have functions which allow you to verify
the distribution, but a priori, something like the above should
give very good results, and is not very expensive to calculate.
(On machines with slow multiplication, the compiler will convert
the 127*x to a left shift and a subtraction.)
 
G

Guest

My key is the pair (a,b) which maps to some other object c.

From what I understand size_t is unsigned long (?), so it should have
atleast 32 bits available?

size_t is an unsigned integer type "large enough", the actual size is
platform dependent, on most 32-bit platforms it is probably 32 bits.
What I meant was that when you do 1 << (a + b), if a is 5 (or larger)
than b is 32 or larger, which means that you shift 1 left 32 times (or
more) which means that you have shifted the 1 out of the results. I was
wrong however, it will not result in 0,but undefined behaviour.
O.k. I am not clear though how it will ensure uniques mapping of the
pair most of the time. I will think more about it.

The value of a will probably not be unique very often since it is so
limited in range, however the value of b have a larger range and a much
greater chance of being unique, and it will certainly be unique for each
value of a.
Vectors are not a good idea, since I do not know apriori how many
objects I will end up creating to be put into the hash_map. It can be
true that the number of elements in hash_map is small. So for 5
dimensional grid, I end up with something like ~15000 elements inside
the hash_map in the end. But still potentially in the worst case (which
is very unlikely though), I could have to put every single region from
the grid into the hash_map, which means for this particular case at
resolution level l : 2^{5l} objects.

No, a vector would only work if you mapped from a to b.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,139
Latest member
JamaalCald
Top