How to implement a Hash Table in C

M

Malcolm McLean

Flash Gordon said:
Malcolm McLean wrote, On 12/08/07 19:22:

So fix it and all the other reported bugs. You should also add a change
record to the book so that people know what you have fixed.


Perhaps he thought you should actually test the code in your book when you
are charging for it? Proper testing would have included filling the hash
table.


Actually, I would say it shows that your implementation makes it harder
for people to see how the algorithm works. A decent teaching
implementation of a hash would, IMHO, include the check rather than
relying on how another non-standard (although provided) function behaves.
I'm inclined to agree. You pass in a capacity, but it would be a lot clearer
if the function explicitly rejected attempts to fill the table beyond that
capacity.
 
B

Ben Bacarisse

Malcolm McLean said:
Yes, that is another bug.
fixedallocate should have an if(answer) before setting top to answer->next.
Why not say that instead of all that vague talk about catastrophic
behaviour when the hash table gets full?

I am sorry that my report was not clear. I am not reacting to this
topic rationally. I find the idea if charging money for a book of
this quality hard to accept, and my anger at that has affected the
clarity of my postings.
(The other bug complaint about the code getting stuck in a loop when
the table gets full is false. Which I don't mind, we can all make
mistakes, but at least admit that you are wrong. Otherwise people
wonder if you really understand hash tables at all.)

Yes, that was my mistake. I don't think it shows your design in a
good light, however. It helps to follow a design if one can think of
the pieces separately, and the correctness of your hash functions all
rely on the allocator failing at the right point. A change to the
code to replace the allocator would break it in a mysterious way.
This kind linkage between components is not desirable and it is
particularly odd to have used it in a teaching book.

Will you accept that readers should get a refund when they see the
code you present for implementing a queue as a circular buffer?
 
R

Richard Heathfield

Malcolm McLean said:
Which ds?
You can read all about hash tables in my book, Basic Algorithms. The
hash tables chapter is free.

Please don't do a navia on us, Malcolm. If your book is good enough,
other people here will recommend it when appropriate. And if it isn't
good enough, nobody here should be recommending it, least of all you.
 
R

Richard Heathfield

CBFalconer said:
I am not sure what your question is.

It seems pretty clear to me. But he needs to decide what language he's
writing in before he starts worrying about difficult stuff.
However, a complete, and
portable, hash table implementation under GPL

....does not answer the question, any more than "what's the best kind of
honey?" is answered by "here, have one of my honey sandwiches".
 
R

Richard Heathfield

Malcolm McLean said:

There are certain advantages in making the hash table length prime.

If you use quadratic probing as your collision avoidance mechanism, a
prime hash table length would seem to me to be essential, not just a
"certain advantage". If, on the other hand, you treat the hash table as
an array of collections of some kind, then I see no advantage at all to
the hash table length being prime.

A book that covers hash tables ought, at the very least, to mention
quadratic probing, and demonstrate how it can get into a cycle with
composite hash table sizes.

For example, consider a hash table with ten elements, of which elements
0, 3, 4, 5, 8 and 9 are full (which leaves elements 1, 2, 6 and 7
empty, so the table is only 60% full).

Now try to add an element at index 4, using quadratic probing, and you
will find yourself trying elements 4, 5, 8, 3, 0, 9, 0, 3, 8, 5, 4, 5,
8, 3, 0, 9, 0, 3, 8, 5, 4, 5, ... ad nauseam.

Using a prime number guarantees that you won't get into this cycle
unless the table is actually full (which you can check separately, in
advance).

Primes aren't just an advantage for quadratic probing - they're
essential.
 
P

pete

Richard said:
CBFalconer said:


It seems pretty clear to me. But he needs to decide what language he's
writing in before he starts worrying about difficult stuff.

Does "ds" really mean "difficult stuff"?
 
R

Richard Heathfield

pete said:
Does "ds" really mean "difficult stuff"?

<grin> Pure coincidence, I'm afraid.

If he decides on C++, his answer is probably to be found in the STL. If
he chooses C, he'll just want to slap an array of objects together if
he's using linear or quadratic probing, or an array of collections
otherwise (where "collection" means something like a linked list, a
binary search tree, or maybe even another hash table).
 
F

Frodo Baggins

Malcolm McLean said:
As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automatically but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropic behaviour" in an arrogant
manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

I think you should fix that and the other clear errors[1] that have been
pointed out before you call other people arrogant. You say in the
introduction that bugs can be "corrected very quickly", but you don't
seem to have corrected anything.

I would suggest using splint (http://www.splint.org/) if you haven't
done so already.

Regards,
Frodo B
 
C

CBFalconer

Richard said:
.... snip ...

For example, consider a hash table with ten elements, of which
elements 0, 3, 4, 5, 8 and 9 are full (which leaves elements 1, 2,
6 and 7 empty, so the table is only 60% full).

Now try to add an element at index 4, using quadratic probing, and
you will find yourself trying elements 4, 5, 8, 3, 0, 9, 0, 3, 8,
5, 4, 5, 8, 3, 0, 9, 0, 3, 8, 5, 4, 5, ... ad nauseam.

Using a prime number guarantees that you won't get into this cycle
unless the table is actually full (which you can check separately,
in advance).

Primes aren't just an advantage for quadratic probing - they're
essential.

That's a nice, compact, and clear explanation. I tried explaining
this to someone some time ago (1 year or more) without success. He
continued to insist that non-prime sizes were suitable.
 
I

Ivar Rosquist

Which ds?
You can read all about hash tables in my book, Basic Algorithms. The
hash tables chapter is free.

Don't bother - that book probably is the buggiest book of the
year.
 
C

CBFalconer

Malcolm said:
.... snip ...


Yes, that is another bug. fixedallocate should have an if(answer)
before setting top to answer->next. Why not say that instead of
all that vague talk about catastrophic behaviour when the hash
table gets full? (The other bug complaint about the code getting
stuck in a loop when the table gets full is false. Which I don't
mind, we can all make mistakes, but at least admit that you are
wrong. Otherwise people wonder if you really understand hash
tables at all.)

I haven't looked at your code, but it appears to be bug infested
from the comments here. Take a look at my hashlib package, which
has been out for some years, and I have had zero bug reports since
the first revision (and that was for a minor memory loss on
closing). Very stable, and purely standard C content. See:

<http://cbfalconer.home.att.net/download>
 
C

CBFalconer

Richard said:
CBFalconer said:

It seems pretty clear to me. But he needs to decide what language
he's writing in before he starts worrying about difficult stuff.


...does not answer the question, any more than "what's the best kind
of honey?" is answered by "here, have one of my honey sandwiches".

What question? What's a 'ds'? What's "C/C++"? In c.l.c I think
we can assume the language is intended to be C.
 
M

Malcolm McLean

Ben Bacarisse said:
I am sorry that my report was not clear. I am not reacting to this
topic rationally. I find the idea if charging money for a book of
this quality hard to accept, and my anger at that has affected the
clarity of my postings.
It's surprising how often people react like that. I get similar accusations
all the time about 12 Common Atheist Arguments (refuted). There of course
there can't be a technical justification.
Yes, that was my mistake. I don't think it shows your design in a
good light, however. It helps to follow a design if one can think of
the pieces separately, and the correctness of your hash functions all
rely on the allocator failing at the right point. A change to the
code to replace the allocator would break it in a mysterious way.
This kind linkage between components is not desirable and it is
particularly odd to have used it in a teaching book.
I think it is clear that the code does need a test against capacity, and
some documentation. So somethign worthwhile has come of all this.
Will you accept that readers should get a refund when they see the
code you present for implementing a queue as a circular buffer?
Maybe it's just me incapable of seeing the problems with my own code, but
what is wrong with a circular buffer? Sure, it doesn't grown dynamically,
but generally you want to ration queue length anyway. If it consistently has
more coming in than you're taking out, there's probably something wrong with
your logic, or the machine is underpowered.
When edition 2 comes out I might give courtesy copies to edition 1 readers
who point out bugs. Maybe even to clc reviewers who have been kind enough to
point out problems. I'm not ungrateful, though sometimes I get a little bit
annoyed at the tone. The snag is that most of the price represents printing
costs, compartively little goes towards my luxury yacht.
 
M

Malcolm McLean

Richard Heathfield said:
Malcolm McLean said:



If you use quadratic probing as your collision avoidance mechanism, a
prime hash table length would seem to me to be essential, not just a
"certain advantage". If, on the other hand, you treat the hash table as
an array of collections of some kind, then I see no advantage at all to
the hash table length being prime.

A book that covers hash tables ought, at the very least, to mention
quadratic probing, and demonstrate how it can get into a cycle with
composite hash table sizes.

For example, consider a hash table with ten elements, of which elements
0, 3, 4, 5, 8 and 9 are full (which leaves elements 1, 2, 6 and 7
empty, so the table is only 60% full).

Now try to add an element at index 4, using quadratic probing, and you
will find yourself trying elements 4, 5, 8, 3, 0, 9, 0, 3, 8, 5, 4, 5,
8, 3, 0, 9, 0, 3, 8, 5, 4, 5, ... ad nauseam.

Using a prime number guarantees that you won't get into this cycle
unless the table is actually full (which you can check separately, in
advance).

Primes aren't just an advantage for quadratic probing - they're
essential.
The hash table chapter includes an example of quadratic probing, which does
as you say.
Maybe the primality test should be more rigorous? The worst code is that
which seems to work, then fails when someone includes it in a life-support
system.
 
M

Mark McIntyre

It's surprising how often people react like that. I get similar accusations
all the time about 12 Common Atheist Arguments (refuted).

<OT>
What, you mean as opposed to the irrationality of the book itself?
</ot>
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
B

Ben Bacarisse

Malcolm McLean said:
It's surprising how often people react like that. I get similar
accusations all the time about 12 Common Atheist Arguments
(refuted).

Do you push that book in groups where only a critique of the grammar
is on topic? If so, then I can see the similarity. If not, then you
missed my point. I am trying very hard to be topical for c.l.c. That
explains, I think, what some have seen as nitpicking. Post in
comp.programming or comp.theory if you want to see some comments on
your book as a whole.
There of course there can't be a technical justification.

You have a strange sense of humour (I assume that was a joke?).

Maybe it's just me incapable of seeing the problems with my own code,
but what is wrong with a circular buffer? Sure, it doesn't grown
dynamically, but generally you want to ration queue length anyway. If
it consistently has more coming in than you're taking out, there's
probably something wrong with your logic, or the machine is
underpowered.

<stay clam>...

No. You are thinking of one kind of queue (where you do want to
balance input rate and output rate) and you've generalised from that.
There are uses of queues where your last statement is simply wrong.
For example, in a breadth first search of a tree using a queue,
continued growth of the queue simply indicates something about the
shape of the tree, not an underpowered machine or faulty logic.

Of course, limiting the queue size is often done, and a circular
buffer is an excellent way to implement a bounded queue. My objection
was that the code you present does not implement a bounded queue.
What you wrote

int queue[100];
int head;
int tail;

void add(int x)
{
if(tail == 100] /* sic */
tail = 0;
queue[tail++] = x;
}

int remove(void)
{
int answer = queue[head];
head++;
if(head == 100)
head = 0;
return answer;
}

is just a circular buffer. A bounded queue would have 'full' and
'empty' tests. The add operation would fail if the queue were full,
and the remove operation (badly named, as has been pointed out) would
fail if it were empty.

What algorithms would your queue implementation be suitable for?
 
R

Richard Heathfield

Malcolm McLean said:
...
The hash table chapter includes an example of quadratic probing, which
does as you say.

Well, I didn't buy the book, so I wouldn't know.
Maybe the primality test should be more rigorous? The worst code is
that which seems to work, then fails when someone includes it in a
life-support system.

The primality test should be *definitive*. There's no excuse for it not
to be. It took me less than 30 seconds - on a relatively ancient
machine - to generate all of the 5761455 primes in the range 1 to
100,000,000. And less than two seconds to generate all the 664579
primes in the range 1 to 10,000,000.

Just how big a hash table did your users want? If it has more than a
hundred million entries, I suspect that the cost of definitive
primality testing will be the least of your problems.
 
J

James Dow Allen

Malcolm McLean said:

If you use quadratic probing as your collision avoidance mechanism, a
prime hash table length would seem to me to be essential, not just a
"certain advantage".

A robust quadratic prober will switch to
linear probing after "a while" (since it
would otherwise be able to access only half
the indexes). For this reason, Richard's
comment would be more robust if "essential"
is replaced with "very highly desireable."
If, on the other hand, you treat the hash table as
an array of collections of some kind, then I see no advantage at all to
the hash table length being prime.

I think there is a good practical reason to
use prime size, unrelated to quadratic probing.

A hash function needs to accomplish two
things: scramble, and reduce to range {0,1,...,N-1}
where N is table size. Two birds can be
(partially) killed with one stone when the last
step is taking remainder (index %= N), but this
scrambles best when N is prime.

(Flamers will be happy to note that we "should"
have scrambled well first but, unless we're
extremely confident in our hash function, why
not get the scrambling benefit of prime-number
remaindering?)

Another interesting issue, related to hashing
though perhaps unrelated to table size, concerns
reversible hash functions ("scrambling bijections").
These are easy to implement on {0,1,2,...,N-1}
*if* N is prime.

James Dow Allen
 
R

Richard Heathfield

James Dow Allen said:

A robust quadratic prober will switch to
linear probing after "a while" (since it
would otherwise be able to access only half
the indexes). For this reason, Richard's
comment would be more robust if "essential"
is replaced with "very highly desireable."

I'll buy that. Here's your dollar.

I think there is a good practical reason to
use prime size, unrelated to quadratic probing.

A hash function needs to accomplish two
things: scramble, and reduce to range {0,1,...,N-1}
where N is table size. Two birds can be
(partially) killed with one stone when the last
step is taking remainder (index %= N), but this
scrambles best when N is prime.

Two bucks in one day? Hmmm. I'm not talking to you any more without my
lawyer.

But seriously, there are also advantages in making the table size an
integer power of 2 (which, in all but one case, fails the primality
test with a vengeance), since the hash value can then be reduced to the
table size more rapidly.

Measure, measure, measure!
 
A

Army1987

Malcolm McLean said:
As for the derefencing null pointer, that will happen if the
constructor fails, and return NULL. That is standard behaviour
throughout the book. If objects cannot be created, the constructing
function returns a null pointer to show they have failed.
I couldn't see another. The code has been tested, but only on two
systems (a UNIX mainframe and Windows) so that doesn't mean no errors
remain, and of course typographical errors do creep in in the process
of reformatting for print - I wish I had a tool to format source code
automatically but I don't. The normal thing when you an error such as
dereference of a null is to say "your code derererences a null" rather
than to talk airly about "catastropic behaviour" in an arrogant
manner.

I am sorry you do not like my tone. I have certainly got worked up
about this subject. Your fixed allocator has an error. The code
cannot return a null pointer without causing undefined behaviour.
Calm enough? You decide how seriously you categorise it.

I think you should fix that and the other clear errors[1] that have been
pointed out before you call other people arrogant. You say in the
introduction that bugs can be "corrected very quickly", but you don't
seem to have corrected anything.

I would suggest using splint (http://www.splint.org/) if you haven't
done so already.

I wouldn't. Many of the warning it finds are bogus. It claims that
in long k = 1; printf("%g\n", k / 2.0) the second argument to
printf() has a wrong type. It complains that the controlling
expression of if(!ferror(stdout)) isn't Boolean, and suggests me
a way to disable warnings when pointers are used as controlling
expressions. It complains about "assigning a character to an int"
when I write i = '\n', but then it remembers "anyway it's safe,
since actually '\n' has type int". If I turn up the warning level
enough, it will also complain about the signed operands in (1<<8)
in the macro expansion of isspace(x) found in ctype.h.
Or better, use it, but do not take what it says too seriously.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,434
Messages
2,571,685
Members
48,796
Latest member
Greg L.

Latest Threads

Top