Perfect Hashfunctions

Johannes Schaub (litb) · Nov 13, 2010

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use a
two-stage hashing system, and use a certain function for the first stage and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

I have no idea what an universal hashing function really is about and
whether that is the only way of creating a perfect hash function, too: I
read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
clue how they relate to the big picture.

Can anyone shed some light on this, please? Is it really difficult to do a
perfect hashing?

Gene · Nov 13, 2010

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...

}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use a
two-stage hashing system, and use a certain function for the first stage and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

I have no idea what an universal hashing function really is about and
whether that is the only way of creating a perfect hash function, too: I
read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
clue how they relate to the big picture.

Can anyone shed some light on this, please? Is it really difficult to do a
perfect hashing?

gperf works pretty well. As I recall the documentation has some
references to algorithms, but it's been a while.

Keith Thompson · Nov 13, 2010

Johannes Schaub (litb) said:
I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

[...]

Your question is probably more about algorithms than about C. You might
get better information in comp.programming.

And, as Gene said, gperf is probably a good place to start. (I haven't
used it myself.)

Keith Thompson · Nov 13, 2010

Keith Thompson said:
Johannes Schaub (litb) said:

I want to learn how to create perfect hash functions for switch statements
over strings.

Click to expand...

[...]
Your question is probably more about algorithms than about C. You might
get better information in comp.programming.

Sorry, I didn't notice the cross-post.

luser- -droog · Nov 13, 2010

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...

}

I don't understand the example. Why are you worried about switch
statements? As far as I know, the term 'perfect hash' refers to
a mapping between strings and integers. And the switch statement
gives you a mapping between integers and code sequences.
Would this pseudocode more accurately reflect what you want?

enum { str1, str2, str3, ..., strN };
ph = makeperfecthash(set of strings)
....
switch (applyhash(ph, str)) {
case str1: ...
case str2: ...
...
case strN: ...
}

There are at least 2 pieces: building (or picking) the hash
function which will perform perfectly over a known set of
strings, and actually executing that hash for a specific
string to get a specific integer.

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

and candy-coated.

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use a
two-stage hashing system, and use a certain function for the first stage and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

Well, yeah. What did you expect?

I have no idea what an universal hashing function really is about and
whether that is the only way of creating a perfect hash function, too: I
read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
clue how they relate to the big picture.

This part is out of my league.

Can anyone shed some light on this, please? Is it really difficult to do a
perfect hashing?

If I haven't shed any light on the real issue,
hopefully I've helped clarify the question.

Nick · Nov 13, 2010

Keith Thompson said:
Johannes Schaub (litb) said:

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

Click to expand...

[...]

And, as Gene said, gperf is probably a good place to start. (I haven't
used it myself.)

I have, although some times ago. It works fine.

Essentially gperf generates the function for you and then you switch on
0..M. Of course, if you plan to change the strings you need to generate
a new function, so you end up building it into your build process,
making it easy to change the strings. If you don't, then its ability to
generate the switch statement for you might be useful.

Ian Collins · Nov 13, 2010

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use a
two-stage hashing system, and use a certain function for the first stage and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

I have no idea what an universal hashing function really is about and
whether that is the only way of creating a perfect hash function, too: I
read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
clue how they relate to the big picture.

Can anyone shed some light on this, please? Is it really difficult to do a
perfect hashing?

It's probably fair to say that as the hash algorithm approaches
perfection, its complexity approaches infinity for an unbounded set of
input strings. You should be able to define a relatively simple
algorithm for an equally small set of input strings.

Dann Corbit · Nov 13, 2010

I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use a
two-stage hashing system, and use a certain function for the first stage and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

I have no idea what an universal hashing function really is about and
whether that is the only way of creating a perfect hash function, too: I
read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
clue how they relate to the big picture.

Can anyone shed some light on this, please? Is it really difficult to do a
perfect hashing?

Google will turn up plenty.
There are some on sourceforge.
Bob Jenkins' site has one.

Johannes Schaub (litb) · Nov 13, 2010

luser- -droog said:
Well, yeah. What did you expect?

Someone told me that there is a perfect hash for any given finite set of
fixed strings. But that by-luck way does't look like that.

If I haven't shed any light on the real issue,
hopefully I've helped clarify the question.

Yes, thanks. The way you rewrote the switch statement made my intent more
clear. That was what I wanted to do ultimately (for implementing switches in
my language).

BartC · Nov 13, 2010

Johannes Schaub (litb) said:
I want to learn how to create perfect hash functions for switch statements
over strings. So given

switch(str) {
case str1...
case str2...
...
case strN...
}

I want to create a perfect hash function that gives me an unique integer
for
each of those strI, in the range of [0 ... M] (M should preferably be
small).

However, I can't for the life of me find any tutorial or documentation on
how to build up such a function. All I find are statements that I can use
a
two-stage hashing system, and use a certain function for the first stage
and
then for the second stage finding the function from a set of "universal
hashing functions" by randomly testing whether a given function will be
perfect. This sounds like "let's try randomly, and if there is no function
that works, we are just shit-outta-luck"

The random approach sounds fine to me, if the strings aren't going to change
that often (and you would only do this once the code is stable):

Use some variation of your favourite hashing scheme; choose some size of
hash table; test that all your strings have a unique hash index; if not, try
something a little different.

Although, this approach means that for your test string, you do need to
check that it is in fact the same string as the one it matches in hash index
(maybe a perfect hash needs this check too?).

(I've looked at gperf, and even after getting past the six pages of licence
agreements, I found it incomprehensible. I think I would rather take a
chance with my own code.)

Perfect Forwarding in Runtime (rvalue reference)	3	Jun 28, 2012
Tasks	1	Nov 29, 2022
Did you know that there is a match-case function in python?	4	Dec 17, 2023
Container Interfaces	17	May 9, 2010
RFC: Building the Perfect Tabbed Pane (an tutorial article)	69	Feb 13, 2008
Universal hashing	5	Apr 7, 2008
How is Hash and Salt Computed Using ASP .net 2.0 built-in Controls	0	Mar 7, 2007
should we Go now?	12	Feb 24, 2013

Perfect Hashfunctions

Johannes Schaub (litb)

Gene

Keith Thompson

Keith Thompson

luser- -droog

Nick

Ian Collins

Dann Corbit

Johannes Schaub (litb)

BartC

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads