Which hashing function to use?

Discussion in 'C Programming' started by January Weiner, Nov 16, 2007.

  1. Hello,

    I need to use a hashing function for relatively short strings (roughly 16
    characters or less). The data that needs to be accessed via hash is just a
    simple int. I just need to look up values in two dimensional matrices each
    time that I encounter a pair of these strings.

    I would like to keep the code as simple and standard as
    possible, but at the same time to have a reasonable performance.

    While looking through the docs on my system I found a few different
    implementations:
    - hsearch from <search.h>
    - tsearch from <search.h>
    - bsearch from <stdlib.h>

    Which one would you recommend? Or maybe something completly different?
    Right now I just use Perl to do the hashing :) this is nice, but I would
    love to have a small, efficient, ANSI C program.

    Regards,

    January
    January Weiner, Nov 16, 2007
    #1
    1. Advertising

  2. January Weiner

    user923005 Guest

    On Nov 16, 3:42 pm, January Weiner <> wrote:
    > Hello,
    >
    > I need to use a hashing function for relatively short strings (roughly 16
    > characters or less). The data that needs to be accessed via hash is just a
    > simple int. I just need to look up values in two dimensional matrices each
    > time that I encounter a pair of these strings.
    >
    > I would like to keep the code as simple and standard as
    > possible, but at the same time to have a reasonable performance.
    >
    > While looking through the docs on my system I found a few different
    > implementations:
    > - hsearch from <search.h>
    > - tsearch from <search.h>
    > - bsearch from <stdlib.h>
    >
    > Which one would you recommend? Or maybe something completly different?
    > Right now I just use Perl to do the hashing :) this is nice, but I would
    > love to have a small, efficient, ANSI C program.


    Your post is more topical in news:comp.programming {follow-ups set}.

    I do not think that your problem has an answer yet. It will depend on
    lots of things.
    For instance, are the strings a static set that never changes? If so,
    then a perfect hash is your choice.
    There are string searching algorithms faster than a hash table (See
    Sedgewick).

    Do you need to preserve the data (e.g in a key/value pair set on
    disk)?

    "I need a hash, which should I use?" is such an open question that any
    simple answer will probably be lacking.
    user923005, Nov 17, 2007
    #2
    1. Advertising

  3. January Weiner

    CBFalconer Guest

    January Weiner wrote:
    >
    > I need to use a hashing function for relatively short strings
    > (roughly 16 characters or less). The data that needs to be
    > accessed via hash is just a simple int. I just need to look up
    > values in two dimensional matrices each time that I encounter a
    > pair of these strings.
    >
    > I would like to keep the code as simple and standard as
    > possible, but at the same time to have a reasonable performance.
    >
    > While looking through the docs on my system I found a few
    > different implementations:
    > - hsearch from <search.h>
    > - tsearch from <search.h>
    > - bsearch from <stdlib.h>
    >
    > Which one would you recommend? Or maybe something completly
    > different? Right now I just use Perl to do the hashing :) this
    > is nice, but I would love to have a small, efficient, ANSI C
    > program.


    There are a couple of very simple, but efficient, string hashing
    routines available (in source form) in hashlib.zip. Although
    hashlib is copyright and released under GPL, those functions are
    pretty common knowledge and there will be no objections to
    including them in anything else. You can find hashlib.zip at:

    <http://cbfalconer.home.att.net/download/>

    --
    Chuck F (cbfalconer at maineline dot net)
    <http://cbfalconer.home.att.net>
    Try the download section.



    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Nov 17, 2007
    #3
  4. user923005 <> wrote:
    > Your post is more topical in news:comp.programming {follow-ups set}.


    Well, it is not. Not 100%. The reason being that I am looking for
    something that is part of the C standard and that is simple to use in C.
    What do you think happens when I ask a question like that in
    comp.programming? Some wise guy will set the Followup-To to comp.lang.c
    :p, that's what would happen. Finally, the reason is that I am not a
    computer scientist and need, hm, very practical advice. Practical advice
    is something that I always got plenty on comp.lang.c :) Note that you will
    find a discussion on different hashing algorithms in the FAQ of this group
    -- however, I do not fully understand it (otherwise I would not be asking
    the questions).

    > I do not think that your problem has an answer yet. It will depend on
    > lots of things.
    > For instance, are the strings a static set that never changes? If so,
    > then a perfect hash is your choice.
    > There are string searching algorithms faster than a hash table (See
    > Sedgewick).


    OK, as I am not a computer scientist, I will try to look it up somewhere.
    Meanwhile, which of that is a part of standard C libraries?

    > Do you need to preserve the data (e.g in a key/value pair set on
    > disk)?


    Nope, I would go for DBM or SQL if I needed that.

    > "I need a hash, which should I use?" is such an open question that any
    > simple answer will probably be lacking.


    "I need a hash, which of the ones in the standard C libraries should I use"
    is a more specific question, is it not?

    Kind regards,

    January
    January Weiner, Nov 17, 2007
    #4
  5. January Weiner wrote:
    > user923005 <> wrote:
    >> Your post is more topical in news:comp.programming {follow-ups set}.

    [...]
    >> "I need a hash, which should I use?" is such an open question that any
    >> simple answer will probably be lacking.

    >
    > "I need a hash, which of the ones in the standard C libraries should I use"
    > is a more specific question, is it not?


    Yes, it is. And the answer is, there are no hash functions in the
    standard C library.

    --
    Keith Thompson (The_Other_Keith) <>
    Looking for software development work in the San Diego area.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Nov 17, 2007
    #5
  6. Keith Thompson said:

    > January Weiner wrote:
    >> user923005 <> wrote:
    >>> Your post is more topical in news:comp.programming {follow-ups set}.

    > [...]
    >>> "I need a hash, which should I use?" is such an open question that any
    >>> simple answer will probably be lacking.

    >>
    >> "I need a hash, which of the ones in the standard C libraries should I
    >> use" is a more specific question, is it not?

    >
    > Yes, it is. And the answer is, there are no hash functions in the
    > standard C library.


    Cases could be made for some of the math functions, and perhaps rand. Not
    particularly good cases, admittedly, but theoretically...

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
    Richard Heathfield, Nov 17, 2007
    #6
  7. "January Weiner" <> wrote in message
    > Hello,
    >
    > I need to use a hashing function for relatively short strings (roughly 16
    > characters or less). The data that needs to be accessed via hash is just
    > a
    > simple int. I just need to look up values in two dimensional matrices each
    > time that I encounter a pair of these strings.
    >
    > I would like to keep the code as simple and standard as
    > possible, but at the same time to have a reasonable performance.
    >
    > While looking through the docs on my system I found a few different
    > implementations:
    > - hsearch from <search.h>
    > - tsearch from <search.h>
    > - bsearch from <stdlib.h>
    >
    > Which one would you recommend? Or maybe something completly different?
    > Right now I just use Perl to do the hashing :) this is nice, but I would
    > love to have a small, efficient, ANSI C program.
    >

    There's a general-purpose hash function on my website - under Basic
    Algorithms, sample chapter on hash tables.
    Whilst it is not specially designed for strings, it's pretty good. Use it
    for now, and replace it only if performance is unacceptable.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
    Malcolm McLean, Nov 17, 2007
    #7
  8. CBFalconer <> wrote:
    > There are a couple of very simple, but efficient, string hashing
    > routines available (in source form) in hashlib.zip. Although
    > hashlib is copyright and released under GPL, those functions are
    > pretty common knowledge and there will be no objections to
    > including them in anything else. You can find hashlib.zip at:


    Thank you very much, that is precisely the answer that I needed.

    January
    January Weiner, Nov 18, 2007
    #8
  9. Malcolm McLean <> wrote:
    > There's a general-purpose hash function on my website - under Basic
    > Algorithms, sample chapter on hash tables.
    > Whilst it is not specially designed for strings, it's pretty good. Use it
    > for now, and replace it only if performance is unacceptable.


    Thanks!

    January
    January Weiner, Nov 18, 2007
    #9
  10. Keith Thompson <> wrote:
    > January Weiner wrote:
    > > "I need a hash, which of the ones in the standard C
    > > libraries should I use" is a more specific question,
    > > is it not?

    >
    > Yes, it is. And the answer is, there are no hash functions
    > in the standard C library.


    There are at least two that fit the raw definition: strlen()
    and strtoul(..., ..., 36). Not necessarily useful, but forms
    of hashing nonetheless.

    --
    Peter
    Peter Nilsson, Nov 18, 2007
    #10
  11. January Weiner

    user923005 Guest

    On Nov 17, 12:09 am, January Weiner <> wrote:
    > user923005 <> wrote:
    > > Your post is more topical in news:comp.programming {follow-ups set}.

    >
    > Well, it is not. Not 100%. The reason being that I am looking for
    > something that is part of the C standard and that is simple to use in C.


    The C standard does not contain a hash function. Quite a pity, too.
    What hash function you should use will be a function of the
    requirements.

    The SDBM and Bernstein are good, simple, fast hashes used to operate
    on strings when you need something like a join or lookup and a few
    rare collisions are OK.
    If you need a much lower chance of collision, then a 64 bit hash like
    UMAC is probably what is desired.
    If you need 0% chance of collisions, then generate a perfect hash
    (obviously, this only works if the inputs are distinct).

    > What do you think happens when I ask a question like that in
    > comp.programming? Some wise guy will set the Followup-To to comp.lang.c
    > :p, that's what would happen.


    Well, he would be a wise guy. True, the group news:comp.programming
    is not specifically for algorithm discussion, but it's the closest
    thing we've got.

    > Finally, the reason is that I am not a
    > computer scientist and need, hm, very practical advice. Practical advice
    > is something that I always got plenty on comp.lang.c :) Note that you will
    > find a discussion on different hashing algorithms in the FAQ of this group
    > -- however, I do not fully understand it (otherwise I would not be asking
    > the questions).


    The practical advice you get here will be of the same quality as
    news:comp.lang.c. Indeed, many of the c.l.c regulars read and post in
    both places.

    > > I do not think that your problem has an answer yet. It will depend on
    > > lots of things.
    > > For instance, are the strings a static set that never changes? If so,
    > > then a perfect hash is your choice.
    > > There are string searching algorithms faster than a hash table (See
    > > Sedgewick).

    >
    > OK, as I am not a computer scientist, I will try to look it up somewhere.
    > Meanwhile, which of that is a part of standard C libraries?


    None of them.

    > > Do you need to preserve the data (e.g in a key/value pair set on
    > > disk)?

    >
    > Nope, I would go for DBM or SQL if I needed that.
    >
    > > "I need a hash, which should I use?" is such an open question that any
    > > simple answer will probably be lacking.

    >
    > "I need a hash, which of the ones in the standard C libraries should I use"
    > is a more specific question, is it not?


    Yes. Unfortunately, the answer is not the one you would like to hear.
    I guess that this site might be helpful:
    http://www.partow.net/programming/hashfunctions/

    I guess further that if you state your requirements precisely you will
    get better answers.
    user923005, Nov 19, 2007
    #11
  12. January Weiner

    Paul Hsieh Guest

    On Nov 16, 3:42 pm, January Weiner <> wrote:
    > Hello,
    >
    > I need to use a hashing function for relatively short strings (roughly
    > 16 characters or less). The data that needs to be accessed via hash is
    > just a simple int. I just need to look up values in two dimensional
    > matrices each time that I encounter a pair of these strings.


    There is a google group dedicated to this sort of thing:

    http://groups.google.com/group/hash-functions

    --
    Paul Hsieh
    http://www.pobox.com/~qed/
    http://bstring.sf.net/
    Paul Hsieh, Nov 20, 2007
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Owen Jacobson
    Replies:
    3
    Views:
    4,578
    CBFalconer
    May 26, 2005
  2. Pat

    Hashing function

    Pat, May 18, 2004, in forum: C++
    Replies:
    2
    Views:
    462
    Ivan Vecerina
    May 18, 2004
  3. Matt Bull

    Hashing function (possibly OT)

    Matt Bull, Jul 7, 2003, in forum: C Programming
    Replies:
    2
    Views:
    327
    Matt Bull
    Jul 7, 2003
  4. Dave

    Hashing Function

    Dave, Nov 17, 2005, in forum: Python
    Replies:
    0
    Views:
    308
  5. Nenad Cikic
    Replies:
    1
    Views:
    121
    Nenad Cikic
    Dec 15, 2012
Loading...

Share This Page