# Perfect Hashfunctions

J

#### Johannes Schaub (litb)

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use a
two-stage hashing system, and use a certain function for the first stage and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

I have no idea what an universal hashing function really is about and
whether that is the only way of creating a perfect hash function, too: I
read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
clue how they relate to the big picture.

Can anyone shed some light on this, please? Is it really difficult to do a
perfect hashing?

G

#### Gene

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...

}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use a
two-stage hashing system, and use a certain function for the first stage and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

I have no idea what an universal hashing function really is about and
whether that is the only way of creating a perfect hash function, too: I
read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
clue how they relate to the big picture.

Can anyone shed some light on this, please? Is it really difficult to do a
perfect hashing?

gperf works pretty well. As I recall the documentation has some
references to algorithms, but it's been a while.

K

#### Keith Thompson

Johannes Schaub (litb) said:
I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).
[...]

get better information in comp.programming.

And, as Gene said, gperf is probably a good place to start. (I haven't
used it myself.)

K

#### Keith Thompson

Keith Thompson said:
Johannes Schaub (litb) said:
I want to learn how to create perfect hash functions for switch statements
over strings.
[...]
get better information in comp.programming.

Sorry, I didn't notice the cross-post.

L

#### luser- -droog

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...

}

I don't understand the example. Why are you worried about switch
statements? As far as I know, the term 'perfect hash' refers to
a mapping between strings and integers. And the switch statement
gives you a mapping between integers and code sequences.
Would this pseudocode more accurately reflect what you want?

enum { str1, str2, str3, ..., strN };
ph = makeperfecthash(set of strings)
....
switch (applyhash(ph, str)) {
case str1: ...
case str2: ...
...
case strN: ...
}

There are at least 2 pieces: building (or picking) the hash
function which will perform perfectly over a known set of
strings, and actually executing that hash for a specific
string to get a specific integer.
I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

and candy-coated.
However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use a
two-stage hashing system, and use a certain function for the first stage and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

Well, yeah. What did you expect?
I have no idea what an universal hashing function really is about and
whether that is the only way of creating a perfect hash function, too: I
read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
clue how they relate to the big picture.

This part is out of my league.
Can anyone shed some light on this, please? Is it really difficult to do a
perfect hashing?

If I haven't shed any light on the real issue,
hopefully I've helped clarify the question.

N

#### Nick

Keith Thompson said:
Johannes Schaub (litb) said:
I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).
[...]

And, as Gene said, gperf is probably a good place to start. (I haven't
used it myself.)

I have, although some times ago. It works fine.

Essentially gperf generates the function for you and then you switch on
0..M. Of course, if you plan to change the strings you need to generate
a new function, so you end up building it into your build process,
making it easy to change the strings. If you don't, then its ability to
generate the switch statement for you might be useful.

I

#### Ian Collins

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use a
two-stage hashing system, and use a certain function for the first stage and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

I have no idea what an universal hashing function really is about and
whether that is the only way of creating a perfect hash function, too: I
read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
clue how they relate to the big picture.

Can anyone shed some light on this, please? Is it really difficult to do a
perfect hashing?

It's probably fair to say that as the hash algorithm approaches
perfection, its complexity approaches infinity for an unbounded set of
input strings. You should be able to define a relatively simple
algorithm for an equally small set of input strings.

D

#### Dann Corbit

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use a
two-stage hashing system, and use a certain function for the first stage and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

I have no idea what an universal hashing function really is about and
whether that is the only way of creating a perfect hash function, too: I
read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
clue how they relate to the big picture.

Can anyone shed some light on this, please? Is it really difficult to do a
perfect hashing?

There are some on sourceforge.
Bob Jenkins' site has one.

J

#### Johannes Schaub (litb)

luser- -droog said:
Well, yeah. What did you expect?

Someone told me that there is a perfect hash for any given finite set of
fixed strings. But that by-luck way does't look like that.
If I haven't shed any light on the real issue,
hopefully I've helped clarify the question.

Yes, thanks. The way you rewrote the switch statement made my intent more
clear. That was what I wanted to do ultimately (for implementing switches in
my language).

B

#### BartC

Johannes Schaub (litb) said:
I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer
for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use
a
two-stage hashing system, and use a certain function for the first stage
and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

The random approach sounds fine to me, if the strings aren't going to change
that often (and you would only do this once the code is stable):

Use some variation of your favourite hashing scheme; choose some size of
hash table; test that all your strings have a unique hash index; if not, try
something a little different.

Although, this approach means that for your test string, you do need to
check that it is in fact the same string as the one it matches in hash index
(maybe a perfect hash needs this check too?).

(I've looked at gperf, and even after getting past the six pages of licence
agreements, I found it incomprehensible. I think I would rather take a
chance with my own code.)