Hash table Implementation

Discussion in 'C Programming' started by aki, Mar 29, 2011.

  1. aki

    aki Guest

    Hi folks ,

    i am working on hastable nowdays and looking ways to optimize my
    code . Please comment on my below observations and let me know if you
    have any suggestions.

    I have an existing implementation of chained hash table (A), which
    uses singly link list for chaining.

    typedef struct hnode HNODE;
    struct hnode{
    HNODE *link;
    };

    typedef struct {
    unsigned long (*hash)(const void *);
    int (*compare)(const void *, const void *);
    const void *(*keyof)(const HNODE *);
    HNODE **table;
    unsigned int buckets;
    unsigned int count;
    } HASH;


    The existing implementation caters a huge amount of data.
    Well the deletion or search of a hash node based on key is not a quick
    operation . As this would require
    calculating the index using hash func , and then searching the linked
    list on that index for the node based on values.

    I would like to optimize the search and deletion .

    so i thought to modify my existing hash table (A)implemetaion .
    1 . using a doubly link list for chaining .
    2 . using a separate data stucture , preferably an other hash table
    (B)
    which would store a address of node inserted in hashtable (A) .
    Nodes in B would be inserted parallel to A .

    Nodes in A would be deleted using B , and the searching of node would
    also done using B .
    Nodes in B would be deleted parallel to A .


    Presently A has hash node structure as :



    I would like to perform the modification in hnode as below

    struct hnode{
    HNODE *link_pre;
    HNODE *link_post;
    };

    Would this give me the desired outcome.

    Thanks
    Aki
    aki, Mar 29, 2011
    #1
    1. Advertising

  2. aki

    Eric Sosman Guest

    On 3/29/2011 12:49 AM, aki wrote:
    > Hi folks ,
    >
    > i am working on hastable nowdays and looking ways to optimize my
    > code . Please comment on my below observations and let me know if you
    > have any suggestions.
    >
    > I have an existing implementation of chained hash table (A), which
    > uses singly link list for chaining.
    >
    > typedef struct hnode HNODE;
    > struct hnode{
    > HNODE *link;
    > };
    >
    > typedef struct {
    > unsigned long (*hash)(const void *);
    > int (*compare)(const void *, const void *);
    > const void *(*keyof)(const HNODE *);
    > HNODE **table;
    > unsigned int buckets;
    > unsigned int count;
    > } HASH;
    >
    >
    > The existing implementation caters a huge amount of data.
    > Well the deletion or search of a hash node based on key is not a quick
    > operation . As this would require
    > calculating the index using hash func , and then searching the linked
    > list on that index for the node based on values.


    But the linked list should be short; that's the reason for
    hashing in the first place. Yes, you could be unlucky and have
    all the keys hash to just a few buckets -- but if that's the case
    micro-optimizing is not going to help. Not enough, anyhow.

    > I would like to optimize the search and deletion .


    Question: Have you actually measured the performance and found
    it to be too slow, or are you just guessing that it might be? Do
    not optimize until you have measured. Even if you are *sure* that
    optimization will be needed, measure first so you can find out how
    much improvement you get -- or detect any disimprovements!

    > so i thought to modify my existing hash table (A)implemetaion .
    > 1 . using a doubly link list for chaining .


    This won't make things any faster, since you'll still need to
    search the list. It might actually slow you down a little, since
    there's more work involved in maintaining two links than one, and
    since doubling the HNODE size means only half as many can fit in
    the various memory caches.

    > 2 . using a separate data stucture , preferably an other hash table
    > (B)
    > which would store a address of node inserted in hashtable (A) .
    > Nodes in B would be inserted parallel to A .
    >
    > Nodes in A would be deleted using B , and the searching of node would
    > also done using B .
    > Nodes in B would be deleted parallel to A .


    Doesn't this just make things worse? You'll face exactly the
    same problem with (B) that you now face with (A), namely, locating
    and/or deleting an item given its key. Except now, after you've
    done all the (B) work, you still have to go back and fuss with (A).
    Where's the benefit?

    > Presently A has hash node structure as :
    >
    >
    >
    > I would like to perform the modification in hnode as below
    >
    > struct hnode{
    > HNODE *link_pre;
    > HNODE *link_post;
    > };
    >
    > Would this give me the desired outcome.


    I don't see how it could, but perhaps I have not understood
    what you intend to do. Whatever your intent, though, measure!

    --
    Eric Sosman
    d
    Eric Sosman, Mar 29, 2011
    #2
    1. Advertising

  3. aki <> writes:

    > i am working on hastable nowdays and looking ways to optimize my
    > code .


    You are considering alternative data structures. The term "optimising
    code" usually refers to something less dramatic. But it also means that
    you are likely to get fuller answers in a group that is not limited by
    language. How you organise your hash table is not specifically a C
    question.

    <snip>
    > I would like to optimize the search and deletion .
    >
    > so i thought to modify my existing hash table (A)implemetaion .
    > 1 . using a doubly link list for chaining .


    Why? At first glance that just uses more space for no gain.

    I'm leaving this comment though having read the rest I suspect the 1 and
    2 go together and since I don't understand 2 (below) it is not
    surprising that I am puzzled by 1.

    > 2 . using a separate data stucture , preferably an other hash table
    > (B)
    > which would store a address of node inserted in hashtable (A) .
    > Nodes in B would be inserted parallel to A .
    >
    > Nodes in A would be deleted using B , and the searching of node would
    > also done using B .
    > Nodes in B would be deleted parallel to A .


    I can't follow what you are suggesting here, but as a rule of thumb I
    think it the case that whatever space is used for "other" data
    structures in a hash table is better used by simply making the original
    table larger and using a good probing algorithm. This is a much studied
    area and to get the bet advice I'd advice you post in comp.programing.

    <snip>
    --
    Ben.
    Ben Bacarisse, Mar 29, 2011
    #3
  4. aki

    Jorgen Grahn Guest

    On Tue, 2011-03-29, aki wrote:
    > Hi folks ,
    >
    > i am working on hastable nowdays and looking ways to optimize my
    > code . Please comment on my below observations and let me know if you
    > have any suggestions.

    ....

    > typedef struct {
    > unsigned long (*hash)(const void *);
    > int (*compare)(const void *, const void *);
    > const void *(*keyof)(const HNODE *);


    This is something I really don't like -- partly because of my C++
    background and partly because I have once fought serious real-life
    performance problems with exactly that hash table design[0].

    At the speed-critical core of your algorithms, you have an untyped
    function pointer interface. Things which are likely very fast
    calculations end up as (relatively speaking) expensive non-optimizable
    function calls. Have you ever picked one likely element type and
    hand-coded a specialized version of the hash table, without function
    pointers, for that type? What was the speed gain?

    Perhaps it's possible to do something similar to what the Linux kernel
    implementation of red-black trees[1] does: it provides functions which
    performs the general operations like rebalancing, but requires the
    user to write (for each key type) the trivial wrapper code.

    /Jorgen

    [0] I solved it by noting that the key space was just 65K, and I could
    simply use an array instead of a hash table ... so I didn't get to
    rewrite the hash table code.
    [1] Google linux rbtree to see it.

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Apr 6, 2011
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Diane
    Replies:
    7
    Views:
    4,772
    Roland Pibinger
    Apr 4, 2006
  2. Ryan Kaskel
    Replies:
    1
    Views:
    546
    Ryan Kaskel
    Apr 26, 2006
  3. Jim Cobban
    Replies:
    8
    Views:
    528
    Jerry Coffin
    Apr 17, 2008
  4. rp
    Replies:
    1
    Views:
    491
    red floyd
    Nov 10, 2011
  5. Srijayanth Sridhar
    Replies:
    19
    Views:
    595
    David A. Black
    Jul 2, 2008
Loading...

Share This Page