Perfect Hashfunctions

Discussion in 'C Programming' started by Johannes Schaub (litb), Nov 13, 2010.

  1. I want to learn how to create perfect hash functions for switch statements
    over strings. So given

    switch(str) {
    case str1...
    case str2...
    ...
    case strN...
    }

    I want to create a perfect hash function that gives me an unique integer for
    each of those strI, in the range of [0 ... M] (M should preferably be
    small).

    However, I can't for the life of me find any tutorial or documentation on
    how to build up such a function. All I find are statements that I can use a
    two-stage hashing system, and use a certain function for the first stage and
    then for the second stage finding the function from a set of "universal
    hashing functions" by randomly testing whether a given function will be
    perfect. This sounds like "let's try randomly, and if there is no function
    that works, we are just shit-outta-luck"

    I have no idea what an universal hashing function really is about and
    whether that is the only way of creating a perfect hash function, too: I
    read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
    clue how they relate to the big picture.

    Can anyone shed some light on this, please? Is it really difficult to do a
    perfect hashing?
     
    Johannes Schaub (litb), Nov 13, 2010
    #1
    1. Advertising

  2. Johannes Schaub (litb)

    Gene Guest

    On Nov 12, 10:44 pm, "Johannes Schaub (litb)" <>
    wrote:
    > I want to learn how to create perfect hash functions for switch statements
    > over strings. So given
    >
    > switch(str) {
    >   case str1...
    >   case str2...
    >   ...
    >   case strN...
    >
    > }
    >
    > I want to create a perfect hash function that gives me an unique integer for
    > each of those strI, in the range of [0 ... M] (M should preferably be
    > small).
    >
    > However, I can't for the life of me find any tutorial or documentation on
    > how to build up such a function. All I find are statements that I can use a
    > two-stage hashing system, and use a certain function for the first stage and
    > then for the second stage finding the function from a set of "universal
    > hashing functions" by randomly testing whether a given function will be
    > perfect. This sounds like "let's try randomly, and if there is no function
    > that works, we are just shit-outta-luck"
    >
    > I have no idea what an universal hashing function really is about and
    > whether that is the only way of creating a perfect hash function, too: I
    > read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
    > clue how they relate to the big picture.
    >
    > Can anyone shed some light on this, please? Is it really difficult to do a
    > perfect hashing?


    gperf works pretty well. As I recall the documentation has some
    references to algorithms, but it's been a while.
     
    Gene, Nov 13, 2010
    #2
    1. Advertising

  3. "Johannes Schaub (litb)" <> writes:
    > I want to learn how to create perfect hash functions for switch statements
    > over strings. So given
    >
    > switch(str) {
    > case str1...
    > case str2...
    > ...
    > case strN...
    > }
    >
    > I want to create a perfect hash function that gives me an unique integer for
    > each of those strI, in the range of [0 ... M] (M should preferably be
    > small).

    [...]

    Your question is probably more about algorithms than about C. You might
    get better information in comp.programming.

    And, as Gene said, gperf is probably a good place to start. (I haven't
    used it myself.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Nov 13, 2010
    #3
  4. Keith Thompson <> writes:
    > "Johannes Schaub (litb)" <> writes:
    >> I want to learn how to create perfect hash functions for switch statements
    >> over strings.

    [...]
    > Your question is probably more about algorithms than about C. You might
    > get better information in comp.programming.


    Sorry, I didn't notice the cross-post.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Nov 13, 2010
    #4
  5. On Nov 12, 9:44 pm, "Johannes Schaub (litb)" <>
    wrote:
    > I want to learn how to create perfect hash functions for switch statements
    > over strings. So given
    >
    > switch(str) {
    >   case str1...
    >   case str2...
    >   ...
    >   case strN...
    >
    > }


    I don't understand the example. Why are you worried about switch
    statements? As far as I know, the term 'perfect hash' refers to
    a mapping between strings and integers. And the switch statement
    gives you a mapping between integers and code sequences.
    Would this pseudocode more accurately reflect what you want?

    enum { str1, str2, str3, ..., strN };
    ph = makeperfecthash(set of strings)
    ....
    switch (applyhash(ph, str)) {
    case str1: ...
    case str2: ...
    ...
    case strN: ...
    }

    There are at least 2 pieces: building (or picking) the hash
    function which will perform perfectly over a known set of
    strings, and actually executing that hash for a specific
    string to get a specific integer.

    > I want to create a perfect hash function that gives me an unique integer for
    > each of those strI, in the range of [0 ... M] (M should preferably be
    > small).


    and candy-coated. ;)

    > However, I can't for the life of me find any tutorial or documentation on
    > how to build up such a function. All I find are statements that I can use a
    > two-stage hashing system, and use a certain function for the first stage and
    > then for the second stage finding the function from a set of "universal
    > hashing functions" by randomly testing whether a given function will be
    > perfect. This sounds like "let's try randomly, and if there is no function
    > that works, we are just shit-outta-luck"


    Well, yeah. What did you expect?

    > I have no idea what an universal hashing function really is about and
    > whether that is the only way of creating a perfect hash function, too: I
    > read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
    > clue how they relate to the big picture.


    This part is out of my league.

    > Can anyone shed some light on this, please? Is it really difficult to do a
    > perfect hashing?


    If I haven't shed any light on the real issue,
    hopefully I've helped clarify the question.

    --
    Do not trouble me with Faramir,
    I know his uses and they are few.
     
    luser- -droog, Nov 13, 2010
    #5
  6. Johannes Schaub (litb)

    Nick Guest

    Keith Thompson <> writes:

    > "Johannes Schaub (litb)" <> writes:
    >> I want to learn how to create perfect hash functions for switch statements
    >> over strings. So given
    >>
    >> switch(str) {
    >> case str1...
    >> case str2...
    >> ...
    >> case strN...
    >> }
    >>
    >> I want to create a perfect hash function that gives me an unique integer for
    >> each of those strI, in the range of [0 ... M] (M should preferably be
    >> small).

    > [...]
    >
    > And, as Gene said, gperf is probably a good place to start. (I haven't
    > used it myself.)


    I have, although some times ago. It works fine.

    Essentially gperf generates the function for you and then you switch on
    0..M. Of course, if you plan to change the strings you need to generate
    a new function, so you end up building it into your build process,
    making it easy to change the strings. If you don't, then its ability to
    generate the switch statement for you might be useful.
    --
    Online waterways route planner | http://canalplan.eu
    Plan trips, see photos, check facilities | http://canalplan.org.uk
     
    Nick, Nov 13, 2010
    #6
  7. Johannes Schaub (litb)

    Ian Collins Guest

    On 11/13/10 04:44 PM, Johannes Schaub (litb) wrote:
    > I want to learn how to create perfect hash functions for switch statements
    > over strings. So given
    >
    > switch(str) {
    > case str1...
    > case str2...
    > ...
    > case strN...
    > }
    >
    > I want to create a perfect hash function that gives me an unique integer for
    > each of those strI, in the range of [0 ... M] (M should preferably be
    > small).
    >
    > However, I can't for the life of me find any tutorial or documentation on
    > how to build up such a function. All I find are statements that I can use a
    > two-stage hashing system, and use a certain function for the first stage and
    > then for the second stage finding the function from a set of "universal
    > hashing functions" by randomly testing whether a given function will be
    > perfect. This sounds like "let's try randomly, and if there is no function
    > that works, we are just shit-outta-luck"
    >
    > I have no idea what an universal hashing function really is about and
    > whether that is the only way of creating a perfect hash function, too: I
    > read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
    > clue how they relate to the big picture.
    >
    > Can anyone shed some light on this, please? Is it really difficult to do a
    > perfect hashing?


    It's probably fair to say that as the hash algorithm approaches
    perfection, its complexity approaches infinity for an unbounded set of
    input strings. You should be able to define a relatively simple
    algorithm for an equally small set of input strings.

    --
    Ian Collins
     
    Ian Collins, Nov 13, 2010
    #7
  8. Johannes Schaub (litb)

    Dann Corbit Guest

    In article <ibl1ck$tsu$03$-online.com>,
    says...
    >
    > I want to learn how to create perfect hash functions for switch statements
    > over strings. So given
    >
    > switch(str) {
    > case str1...
    > case str2...
    > ...
    > case strN...
    > }
    >
    > I want to create a perfect hash function that gives me an unique integer for
    > each of those strI, in the range of [0 ... M] (M should preferably be
    > small).
    >
    > However, I can't for the life of me find any tutorial or documentation on
    > how to build up such a function. All I find are statements that I can use a
    > two-stage hashing system, and use a certain function for the first stage and
    > then for the second stage finding the function from a set of "universal
    > hashing functions" by randomly testing whether a given function will be
    > perfect. This sounds like "let's try randomly, and if there is no function
    > that works, we are just shit-outta-luck"
    >
    > I have no idea what an universal hashing function really is about and
    > whether that is the only way of creating a perfect hash function, too: I
    > read about CHD, BDZ, BMZ, CHM, BRZ and FCH on the page of cmph, but I've no
    > clue how they relate to the big picture.
    >
    > Can anyone shed some light on this, please? Is it really difficult to do a
    > perfect hashing?


    Google will turn up plenty.
    There are some on sourceforge.
    Bob Jenkins' site has one.
     
    Dann Corbit, Nov 13, 2010
    #8
  9. luser- -droog wrote:

    > On Nov 12, 9:44 pm, "Johannes Schaub (litb)" <>
    > wrote:
    >> However, I can't for the life of me find any tutorial or documentation on
    >> how to build up such a function. All I find are statements that I can use
    >> a two-stage hashing system, and use a certain function for the first
    >> stage and then for the second stage finding the function from a set of
    >> "universal hashing functions" by randomly testing whether a given
    >> function will be perfect. This sounds like "let's try randomly, and if
    >> there is no function that works, we are just shit-outta-luck"

    >
    > Well, yeah. What did you expect?
    >


    Someone told me that there is a perfect hash for any given finite set of
    fixed strings. But that by-luck way does't look like that.

    >> Can anyone shed some light on this, please? Is it really difficult to do
    >> a perfect hashing?

    >
    > If I haven't shed any light on the real issue,
    > hopefully I've helped clarify the question.
    >


    Yes, thanks. The way you rewrote the switch statement made my intent more
    clear. That was what I wanted to do ultimately (for implementing switches in
    my language).
     
    Johannes Schaub (litb), Nov 13, 2010
    #9
  10. Johannes Schaub (litb)

    BartC Guest

    "Johannes Schaub (litb)" <> wrote in message
    news:ibl1ck$tsu$03$-online.com...
    > I want to learn how to create perfect hash functions for switch statements
    > over strings. So given
    >
    > switch(str) {
    > case str1...
    > case str2...
    > ...
    > case strN...
    > }
    >
    > I want to create a perfect hash function that gives me an unique integer
    > for
    > each of those strI, in the range of [0 ... M] (M should preferably be
    > small).
    >
    > However, I can't for the life of me find any tutorial or documentation on
    > how to build up such a function. All I find are statements that I can use
    > a
    > two-stage hashing system, and use a certain function for the first stage
    > and
    > then for the second stage finding the function from a set of "universal
    > hashing functions" by randomly testing whether a given function will be
    > perfect. This sounds like "let's try randomly, and if there is no function
    > that works, we are just shit-outta-luck"


    The random approach sounds fine to me, if the strings aren't going to change
    that often (and you would only do this once the code is stable):

    Use some variation of your favourite hashing scheme; choose some size of
    hash table; test that all your strings have a unique hash index; if not, try
    something a little different.

    Although, this approach means that for your test string, you do need to
    check that it is in fact the same string as the one it matches in hash index
    (maybe a perfect hash needs this check too?).

    (I've looked at gperf, and even after getting past the six pages of licence
    agreements, I found it incomprehensible. I think I would rather take a
    chance with my own code.)

    --
    Bartc
     
    BartC, Nov 13, 2010
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hitesh Panchal

    Not Getting perfect path

    Hitesh Panchal, Mar 3, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    331
    Mohamed El Ashmawy
    Mar 3, 2004
  2. =?Utf-8?B?QW5keQ==?=

    How can I perfect the look of my asp.net site?

    =?Utf-8?B?QW5keQ==?=, May 23, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    454
    Kevin Spencer
    May 23, 2005
  3. Replies:
    0
    Views:
    8,317
  4. Terrence Cooter

    PERFECT REIGN WHOIS DATA IS BOGUS

    Terrence Cooter, Sep 28, 2005, in forum: HTML
    Replies:
    11
    Views:
    875
    Tyrone M. Pierce
    Sep 30, 2005
  5. Johannes Schaub (litb)

    Perfect Hashfunctions

    Johannes Schaub (litb), Nov 13, 2010, in forum: C++
    Replies:
    9
    Views:
    286
    BartC
    Nov 13, 2010
Loading...

Share This Page