Hash table Implementation

A

aki

Hi folks ,

i am working on hastable nowdays and looking ways to optimize my
code . Please comment on my below observations and let me know if you
have any suggestions.

I have an existing implementation of chained hash table (A), which
uses singly link list for chaining.

typedef struct hnode HNODE;
struct hnode{
HNODE *link;
};

typedef struct {
unsigned long (*hash)(const void *);
int (*compare)(const void *, const void *);
const void *(*keyof)(const HNODE *);
HNODE **table;
unsigned int buckets;
unsigned int count;
} HASH;


The existing implementation caters a huge amount of data.
Well the deletion or search of a hash node based on key is not a quick
operation . As this would require
calculating the index using hash func , and then searching the linked
list on that index for the node based on values.

I would like to optimize the search and deletion .

so i thought to modify my existing hash table (A)implemetaion .
1 . using a doubly link list for chaining .
2 . using a separate data stucture , preferably an other hash table
(B)
which would store a address of node inserted in hashtable (A) .
Nodes in B would be inserted parallel to A .

Nodes in A would be deleted using B , and the searching of node would
also done using B .
Nodes in B would be deleted parallel to A .


Presently A has hash node structure as :



I would like to perform the modification in hnode as below

struct hnode{
HNODE *link_pre;
HNODE *link_post;
};

Would this give me the desired outcome.

Thanks
Aki
 
E

Eric Sosman

Hi folks ,

i am working on hastable nowdays and looking ways to optimize my
code . Please comment on my below observations and let me know if you
have any suggestions.

I have an existing implementation of chained hash table (A), which
uses singly link list for chaining.

typedef struct hnode HNODE;
struct hnode{
HNODE *link;
};

typedef struct {
unsigned long (*hash)(const void *);
int (*compare)(const void *, const void *);
const void *(*keyof)(const HNODE *);
HNODE **table;
unsigned int buckets;
unsigned int count;
} HASH;


The existing implementation caters a huge amount of data.
Well the deletion or search of a hash node based on key is not a quick
operation . As this would require
calculating the index using hash func , and then searching the linked
list on that index for the node based on values.

But the linked list should be short; that's the reason for
hashing in the first place. Yes, you could be unlucky and have
all the keys hash to just a few buckets -- but if that's the case
micro-optimizing is not going to help. Not enough, anyhow.
I would like to optimize the search and deletion .

Question: Have you actually measured the performance and found
it to be too slow, or are you just guessing that it might be? Do
not optimize until you have measured. Even if you are *sure* that
optimization will be needed, measure first so you can find out how
much improvement you get -- or detect any disimprovements!
so i thought to modify my existing hash table (A)implemetaion .
1 . using a doubly link list for chaining .

This won't make things any faster, since you'll still need to
search the list. It might actually slow you down a little, since
there's more work involved in maintaining two links than one, and
since doubling the HNODE size means only half as many can fit in
the various memory caches.
2 . using a separate data stucture , preferably an other hash table
(B)
which would store a address of node inserted in hashtable (A) .
Nodes in B would be inserted parallel to A .

Nodes in A would be deleted using B , and the searching of node would
also done using B .
Nodes in B would be deleted parallel to A .

Doesn't this just make things worse? You'll face exactly the
same problem with (B) that you now face with (A), namely, locating
and/or deleting an item given its key. Except now, after you've
done all the (B) work, you still have to go back and fuss with (A).
Where's the benefit?
Presently A has hash node structure as :



I would like to perform the modification in hnode as below

struct hnode{
HNODE *link_pre;
HNODE *link_post;
};

Would this give me the desired outcome.

I don't see how it could, but perhaps I have not understood
what you intend to do. Whatever your intent, though, measure!
 
B

Ben Bacarisse

aki said:
i am working on hastable nowdays and looking ways to optimize my
code .

You are considering alternative data structures. The term "optimising
code" usually refers to something less dramatic. But it also means that
you are likely to get fuller answers in a group that is not limited by
language. How you organise your hash table is not specifically a C
question.

I would like to optimize the search and deletion .

so i thought to modify my existing hash table (A)implemetaion .
1 . using a doubly link list for chaining .

Why? At first glance that just uses more space for no gain.

I'm leaving this comment though having read the rest I suspect the 1 and
2 go together and since I don't understand 2 (below) it is not
surprising that I am puzzled by 1.
2 . using a separate data stucture , preferably an other hash table
(B)
which would store a address of node inserted in hashtable (A) .
Nodes in B would be inserted parallel to A .

Nodes in A would be deleted using B , and the searching of node would
also done using B .
Nodes in B would be deleted parallel to A .

I can't follow what you are suggesting here, but as a rule of thumb I
think it the case that whatever space is used for "other" data
structures in a hash table is better used by simply making the original
table larger and using a good probing algorithm. This is a much studied
area and to get the bet advice I'd advice you post in comp.programing.

<snip>
 
J

Jorgen Grahn

Hi folks ,

i am working on hastable nowdays and looking ways to optimize my
code . Please comment on my below observations and let me know if you
have any suggestions. ....

typedef struct {
unsigned long (*hash)(const void *);
int (*compare)(const void *, const void *);
const void *(*keyof)(const HNODE *);

This is something I really don't like -- partly because of my C++
background and partly because I have once fought serious real-life
performance problems with exactly that hash table design[0].

At the speed-critical core of your algorithms, you have an untyped
function pointer interface. Things which are likely very fast
calculations end up as (relatively speaking) expensive non-optimizable
function calls. Have you ever picked one likely element type and
hand-coded a specialized version of the hash table, without function
pointers, for that type? What was the speed gain?

Perhaps it's possible to do something similar to what the Linux kernel
implementation of red-black trees[1] does: it provides functions which
performs the general operations like rebalancing, but requires the
user to write (for each key type) the trivial wrapper code.

/Jorgen

[0] I solved it by noting that the key space was just 65K, and I could
simply use an array instead of a hash table ... so I didn't get to
rewrite the hash table code.
[1] Google linux rbtree to see it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top