# Perfect Hashfunctions

Discussion in 'C Programming' started by Johannes Schaub (litb), Nov 13, 2010.

1. ### Johannes Schaub (litb)Guest

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use a
two-stage hashing system, and use a certain function for the first stage and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

I have no idea what an universal hashing function really is about and
whether that is the only way of creating a perfect hash function, too: I
read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
clue how they relate to the big picture.

Can anyone shed some light on this, please? Is it really difficult to do a
perfect hashing?

Johannes Schaub (litb), Nov 13, 2010

2. ### GeneGuest

On Nov 12, 10:44 pm, "Johannes Schaub (litb)" <>
wrote:
> I want to learn how to create perfect hash functions for switch statements
> over strings. So given
>
> switch(str) {
>   case str1...
>   case str2...
>   ...
>   case strN...
>
> }
>
> I want to create a perfect hash function that gives me an unique integer for
> each of those strI, in the range of [0 ... M] (M should preferably be
> small).
>
> However, I can't for the life of me find any tutorial or documentation on
> how to build up such a function. All I find are statements that I can use a
> two-stage hashing system, and use a certain function for the first stage and
> then for the second stage finding the function from a set of "universal
> hashing functions" by randomly testing whether a given function will be
> perfect. This sounds like "let's try randomly, and if there is no function
> that works, we are just shit-outta-luck"
>
> I have no idea what an universal hashing function really is about and
> whether that is the only way of creating a perfect hash function, too: I
> read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
> clue how they relate to the big picture.
>
> Can anyone shed some light on this, please? Is it really difficult to do a
> perfect hashing?

gperf works pretty well. As I recall the documentation has some
references to algorithms, but it's been a while.

Gene, Nov 13, 2010

3. ### Keith ThompsonGuest

"Johannes Schaub (litb)" <> writes:
> I want to learn how to create perfect hash functions for switch statements
> over strings. So given
>
> switch(str) {
> case str1...
> case str2...
> ...
> case strN...
> }
>
> I want to create a perfect hash function that gives me an unique integer for
> each of those strI, in the range of [0 ... M] (M should preferably be
> small).

[...]

get better information in comp.programming.

And, as Gene said, gperf is probably a good place to start. (I haven't
used it myself.)

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Nov 13, 2010
4. ### Keith ThompsonGuest

Keith Thompson <> writes:
> "Johannes Schaub (litb)" <> writes:
>> I want to learn how to create perfect hash functions for switch statements
>> over strings.

[...]
> get better information in comp.programming.

Sorry, I didn't notice the cross-post.

--
Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson, Nov 13, 2010
5. ### luser- -droogGuest

On Nov 12, 9:44 pm, "Johannes Schaub (litb)" <>
wrote:
> I want to learn how to create perfect hash functions for switch statements
> over strings. So given
>
> switch(str) {
>   case str1...
>   case str2...
>   ...
>   case strN...
>
> }

I don't understand the example. Why are you worried about switch
statements? As far as I know, the term 'perfect hash' refers to
a mapping between strings and integers. And the switch statement
gives you a mapping between integers and code sequences.
Would this pseudocode more accurately reflect what you want?

enum { str1, str2, str3, ..., strN };
ph = makeperfecthash(set of strings)
....
switch (applyhash(ph, str)) {
case str1: ...
case str2: ...
...
case strN: ...
}

There are at least 2 pieces: building (or picking) the hash
function which will perform perfectly over a known set of
strings, and actually executing that hash for a specific
string to get a specific integer.

> I want to create a perfect hash function that gives me an unique integer for
> each of those strI, in the range of [0 ... M] (M should preferably be
> small).

and candy-coated.

> However, I can't for the life of me find any tutorial or documentation on
> how to build up such a function. All I find are statements that I can use a
> two-stage hashing system, and use a certain function for the first stage and
> then for the second stage finding the function from a set of "universal
> hashing functions" by randomly testing whether a given function will be
> perfect. This sounds like "let's try randomly, and if there is no function
> that works, we are just shit-outta-luck"

Well, yeah. What did you expect?

> I have no idea what an universal hashing function really is about and
> whether that is the only way of creating a perfect hash function, too: I
> read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
> clue how they relate to the big picture.

This part is out of my league.

> Can anyone shed some light on this, please? Is it really difficult to do a
> perfect hashing?

If I haven't shed any light on the real issue,
hopefully I've helped clarify the question.

--
Do not trouble me with Faramir,
I know his uses and they are few.

luser- -droog, Nov 13, 2010
6. ### NickGuest

Keith Thompson <> writes:

> "Johannes Schaub (litb)" <> writes:
>> I want to learn how to create perfect hash functions for switch statements
>> over strings. So given
>>
>> switch(str) {
>> case str1...
>> case str2...
>> ...
>> case strN...
>> }
>>
>> I want to create a perfect hash function that gives me an unique integer for
>> each of those strI, in the range of [0 ... M] (M should preferably be
>> small).

> [...]
>
> And, as Gene said, gperf is probably a good place to start. (I haven't
> used it myself.)

I have, although some times ago. It works fine.

Essentially gperf generates the function for you and then you switch on
0..M. Of course, if you plan to change the strings you need to generate
a new function, so you end up building it into your build process,
making it easy to change the strings. If you don't, then its ability to
generate the switch statement for you might be useful.
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk

Nick, Nov 13, 2010
7. ### Ian CollinsGuest

On 11/13/10 04:44 PM, Johannes Schaub (litb) wrote:
> I want to learn how to create perfect hash functions for switch statements
> over strings. So given
>
> switch(str) {
> case str1...
> case str2...
> ...
> case strN...
> }
>
> I want to create a perfect hash function that gives me an unique integer for
> each of those strI, in the range of [0 ... M] (M should preferably be
> small).
>
> However, I can't for the life of me find any tutorial or documentation on
> how to build up such a function. All I find are statements that I can use a
> two-stage hashing system, and use a certain function for the first stage and
> then for the second stage finding the function from a set of "universal
> hashing functions" by randomly testing whether a given function will be
> perfect. This sounds like "let's try randomly, and if there is no function
> that works, we are just shit-outta-luck"
>
> I have no idea what an universal hashing function really is about and
> whether that is the only way of creating a perfect hash function, too: I
> read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
> clue how they relate to the big picture.
>
> Can anyone shed some light on this, please? Is it really difficult to do a
> perfect hashing?

It's probably fair to say that as the hash algorithm approaches
perfection, its complexity approaches infinity for an unbounded set of
input strings. You should be able to define a relatively simple
algorithm for an equally small set of input strings.

--
Ian Collins

Ian Collins, Nov 13, 2010
8. ### Dann CorbitGuest

In article <ibl1ck\$tsu\$03\$-online.com>,
says...
>
> I want to learn how to create perfect hash functions for switch statements
> over strings. So given
>
> switch(str) {
> case str1...
> case str2...
> ...
> case strN...
> }
>
> I want to create a perfect hash function that gives me an unique integer for
> each of those strI, in the range of [0 ... M] (M should preferably be
> small).
>
> However, I can't for the life of me find any tutorial or documentation on
> how to build up such a function. All I find are statements that I can use a
> two-stage hashing system, and use a certain function for the first stage and
> then for the second stage finding the function from a set of "universal
> hashing functions" by randomly testing whether a given function will be
> perfect. This sounds like "let's try randomly, and if there is no function
> that works, we are just shit-outta-luck"
>
> I have no idea what an universal hashing function really is about and
> whether that is the only way of creating a perfect hash function, too: I
> read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
> clue how they relate to the big picture.
>
> Can anyone shed some light on this, please? Is it really difficult to do a
> perfect hashing?

There are some on sourceforge.
Bob Jenkins' site has one.

Dann Corbit, Nov 13, 2010
9. ### Johannes Schaub (litb)Guest

luser- -droog wrote:

> On Nov 12, 9:44 pm, "Johannes Schaub (litb)" <>
> wrote:
>> However, I can't for the life of me find any tutorial or documentation on
>> how to build up such a function. All I find are statements that I can use
>> a two-stage hashing system, and use a certain function for the first
>> stage and then for the second stage finding the function from a set of
>> "universal hashing functions" by randomly testing whether a given
>> function will be perfect. This sounds like "let's try randomly, and if
>> there is no function that works, we are just shit-outta-luck"

>
> Well, yeah. What did you expect?
>

Someone told me that there is a perfect hash for any given finite set of
fixed strings. But that by-luck way does't look like that.

>> Can anyone shed some light on this, please? Is it really difficult to do
>> a perfect hashing?

>
> If I haven't shed any light on the real issue,
> hopefully I've helped clarify the question.
>

Yes, thanks. The way you rewrote the switch statement made my intent more
clear. That was what I wanted to do ultimately (for implementing switches in
my language).

Johannes Schaub (litb), Nov 13, 2010
10. ### BartCGuest

"Johannes Schaub (litb)" <> wrote in message
news:ibl1ck\$tsu\$03\$-online.com...
> I want to learn how to create perfect hash functions for switch statements
> over strings. So given
>
> switch(str) {
> case str1...
> case str2...
> ...
> case strN...
> }
>
> I want to create a perfect hash function that gives me an unique integer
> for
> each of those strI, in the range of [0 ... M] (M should preferably be
> small).
>
> However, I can't for the life of me find any tutorial or documentation on
> how to build up such a function. All I find are statements that I can use
> a
> two-stage hashing system, and use a certain function for the first stage
> and
> then for the second stage finding the function from a set of "universal
> hashing functions" by randomly testing whether a given function will be
> perfect. This sounds like "let's try randomly, and if there is no function
> that works, we are just shit-outta-luck"

The random approach sounds fine to me, if the strings aren't going to change
that often (and you would only do this once the code is stable):

Use some variation of your favourite hashing scheme; choose some size of
hash table; test that all your strings have a unique hash index; if not, try
something a little different.

Although, this approach means that for your test string, you do need to
check that it is in fact the same string as the one it matches in hash index
(maybe a perfect hash needs this check too?).

(I've looked at gperf, and even after getting past the six pages of licence
agreements, I found it incomprehensible. I think I would rather take a
chance with my own code.)

--
Bartc

BartC, Nov 13, 2010