Calculating hashcode in double hashing

M

Mikke

Hello,

I'm using double hashing functions in my hashtable.
I have two following hashfuctions.

public int hashFunction1(int key)
{
return key % arraysize;
}

public int hashFunction2(int key)
{
// Less than array size.
return 5 - key % 5 // returns StepSize
}

The question is that if I'm going to pass the key as String
containing URL (http://localhost/mypage.html) to e.g. WWW page,
how should I actually convert that string to int? I am not sure.

Function declaration would then be following:

public int hashFunction1(String key)
public int hashFunction2(String key)

Can anyone give advise?

Thanks!
 
R

Roedy Green

public int hashFunction2(int key)
{
// Less than array size.
return 5 - key % 5 // returns StepSize
}

modulus is not something you should use in a hashfunction. Reducing
its span in the concern of the Hashtable no the hashfunction. You
want something that spans all 32 bits.

see http://mindprod.com/jgloss/hashcode.html

XORING hashcodes together is a fairly safe way to combine them. use
the ^ operator.
 
M

Mikke

Ok, I must clearify myself.

I'm doing this "double hashing" because I have to implement
a HashTable for both Java and C/C++. C/C++ environment does not support e.g.
STL so, I can not use STL and Java's Hashtable class and it's methods
(or actually try to port in into C/C++) to make code that look same in both
Java and C/C++ versions. That is the reason why I have to code it without
any classlibrary support.

I read from the book by Mark Allen Weiss (Data Structures and
Algorithm Analysin in Java) that in double hashing there are two hashing
functions. The second hashing function should have rules below:
- HashTable size should be prime (count of buckets)
- Hash2 function could be following: Hash2(x) = R - (x mod R), with R a
prime smaller than tablesize (count of buckets)

So, what I don't actually know is that if my key to HashTable is e.g. URL
string and I'm saving www pages into that HashTable, how should I calculate
key from URL like http://localhost/mypage.html?

That's why I'm interested in it.

So, if someone have a code snippets or somethig that calculates
hashvalue using double hashing using Java(without using "too much" Java's
Collection API) or C I'm very interested in it.

Cheers!
 
R

Roedy Green

So, if someone have a code snippets or somethig that calculates
hashvalue using double hashing using Java(without using "too much" Java's
Collection API) or C I'm very interested in it.

Why not look at the HashCode function for String that Java computes?

You could reproduce that.

Please read the links I gave you before. hashCodes have nothing to do
with primes. Hashtables do.

The HashTable collapses the hashcode to its current needs. You don't
buid that into the hashCode function.

Other hashcode functions you could write in C and Java are
CRC-32, Adlerian checksum, MD-5.

rotate and XOR will do pretty well and it is very fast.

here is a 16-bit hash function in assembler I used in the BBL
compiler.
; length in CX
MOV BX,AX ; build hash in BX
ROL BX,1
JCXZ HASH2
HASH1: ; loop for each char
LODSB
XOR BL,AL
ROL BX,1
LOOP HASH1
HASH2:


basically to rotate left your hash so far and xor on the next byte.


Something similar in Java might look like this:

int hash = 0;

for (int i=0; i<b.length; i++ )
{
hash |= b;
boolean sign = hash < 0;
hash <<= 1;
hash |= sign ? 1 : 0;

}


}
 
M

Mikke

OK, thanks!

I thing I have used too much 'ready-to-use' Collections (Java/STL)
and forgot what they actually do inside them.

Cheers!

Roedy said:
So, if someone have a code snippets or somethig that calculates
hashvalue using double hashing using Java(without using "too much" Java's
Collection API) or C I'm very interested in it.


Why not look at the HashCode function for String that Java computes?

You could reproduce that.

Please read the links I gave you before. hashCodes have nothing to do
with primes. Hashtables do.

The HashTable collapses the hashcode to its current needs. You don't
buid that into the hashCode function.

Other hashcode functions you could write in C and Java are
CRC-32, Adlerian checksum, MD-5.

rotate and XOR will do pretty well and it is very fast.

here is a 16-bit hash function in assembler I used in the BBL
compiler.
; length in CX
MOV BX,AX ; build hash in BX
ROL BX,1
JCXZ HASH2
HASH1: ; loop for each char
LODSB
XOR BL,AL
ROL BX,1
LOOP HASH1
HASH2:


basically to rotate left your hash so far and xor on the next byte.


Something similar in Java might look like this:

int hash = 0;

for (int i=0; i<b.length; i++ )
{
hash |= b;
boolean sign = hash < 0;
hash <<= 1;
hash |= sign ? 1 : 0;

}


}
 
C

Chris Uppal

Mikke said:
I'm using double hashing functions in my hashtable.

You know your own data best, but if you haven't actually tested it yet then I
suspect that you'll find that double hashing is a waste of effort. If the hash
function is any good, and you don't overload your table, then single hashing
will usually work perfectly acceptably for nearly all applications. There are
exceptions, of course, and maybe your application is one of them.

A subtle problem with double hashing, BTW, is that it interacts even worse with
the machine's data caching than a normal hash table -- normally once you've
hashed your string (say) you then do a linear scan down a stretch of contiguous
memory looking for the relevant slot -- and that strip of memory will usually
all be in the fastest cache. If you double hash, then the CPU pulls the same
data into its cache, only for the program to jump off and look at other memory
somewhere else in RAM...

The question is that if I'm going to pass the key as String
containing URL (http://localhost/mypage.html) to e.g. WWW page,
how should I actually convert that string to int? I am not sure.

String hashing is a very easy thing to get wrong. There have been many
published hashes that simply don't work very well on naturally occurring data.
(There's a paper somewhere comparing some of the published string hashes, I
don't have it handy but I'll dig it out if anyone is interested.) Here's a 'C'
string hash that I've used before in several applications, and it has always
behaved well. The most difficult "load" I've tested it one was share
identifiers on the Stock Exchange which tend to be just 3 or 4 uppercase
characters so there's not a lot of "spread" to the data.

-- chris

====== C code =======
typedef unsigned long HASH;
typedef unsigned char uchar;
HASH
lc_hash(uchar *ptr)
{
HASH hash;

for (hash = 0; *ptr; ptr++)
{
/* basically a peturbed linear-congruential PRG */
hash += *ptr;
hash *= 597485621;
}

return hash;
}
====================
 
C

Chris Gray

Mikke said:
Hello,

I'm using double hashing functions in my hashtable.
I have two following hashfuctions.

public int hashFunction1(int key)
{
return key % arraysize;
}

public int hashFunction2(int key)
{
// Less than array size.
return 5 - key % 5 // returns StepSize
}

The question is that if I'm going to pass the key as String
containing URL (http://localhost/mypage.html) to e.g. WWW page,
how should I actually convert that string to int? I am not sure.

String.hashCode()?
 
M

Mark Thornton

Mikke said:
Ok, I must clearify myself.

I'm doing this "double hashing" because I have to implement
a HashTable for both Java and C/C++. C/C++ environment does not support e.g.
STL so, I can not use STL and Java's Hashtable class and it's methods
(or actually try to port in into C/C++) to make code that look same in both
Java and C/C++ versions. That is the reason why I have to code it without
any classlibrary support.

I read from the book by Mark Allen Weiss (Data Structures and
Algorithm Analysin in Java) that in double hashing there are two hashing
functions. The second hashing function should have rules below:
- HashTable size should be prime (count of buckets)
- Hash2 function could be following: Hash2(x) = R - (x mod R), with R a
prime smaller than tablesize (count of buckets)

Watch out for negative values of x if you use the % operator (or use R
less than half the table size. Personally I use multipicative hashing
with a table size equal to a power of two and two different multipliers.
The second hash must be odd (just | the hash value with 1).

Mark Thornton
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top