How to implement a Hash Table in C

M

Malcolm McLean

Richard Heathfield said:
It was more of a meta-discussion really - the intent was to encourage
Malcolm to fold my points back onto the discussion of his algorithms
book. But your point is nevertheless well-taken.
I observed that the reaction to Basic Algorithms was almost identical to the
reaction to 12 Common Atheist Arguments (refuted). I wouldn't really want to
pursue the point further, but it is telling.
 
R

Richard Heathfield

Malcolm McLean said:
I observed that the reaction to Basic Algorithms was almost identical
to the reaction to 12 Common Atheist Arguments (refuted). I wouldn't
really want to pursue the point further, but it is telling.

Indeed, but not necessarily for the reason you imagine.
 
K

Keith Thompson

Malcolm McLean said:
I observed that the reaction to Basic Algorithms was almost identical
to the reaction to 12 Common Atheist Arguments (refuted). I wouldn't
really want to pursue the point further, but it is telling.

Telling? How exactly is it telling?

The reasons for the reaction to your Basic Algorithms book have been
discussed at great length. They have nothing to do with the author.
I, for one, would have been delighted if you had written and published
a *good* Basic Algorithms book, one that presented well-written code
that both presented the algorithms clearly and demonstrated a strong
command of whatever implementation language you chose (it happened to
be C). Minor errors in such a work are nearly inevitable; time
permitting, I would have been glad to review the book and point them
out so they can be corrected.

I can't claim to speak for anyone else, but I'm sure that many other
people here feel the same way.

But when several people read a sample chapter and find numerous
blatant errors (code that doesn't even compile, algorithms that
exhibit undefined behavior, stubborn refusal to use the features of
the language), blaming others for their reaction is absurd.

I might be interested in discussing your atheism book, but I won't do
so here, except to suggest that if two books by the same author
receive similar reactions, you should consider the possibility that
the common factor is the author and his writing style, not some
conspiracy by others to denigrate the author personally.
 
M

Malcolm McLean

CBFalconer said:
... snip all further elucidation ...

I suspect you need go no further to make an enemy.
No, I'm happy to have my works condemned.
Even with Ivor Rockbrain it's more a case of "this post is pure abuse, I'd
better not tolerate it" than anything personal.

Keith's point that "the general consensus on clc is that the book is no
good, therefore it is no good" was false, and naturally I had to reply to
that, by giving the real explanation. But I don't blame people for acting as
they do. Kenny McCormack is essentially right when he sees most of the
bandwidth on clc as dominance games, but where he is wrong is in imagining
that life could be any different.

Life is a game, and the vast majority of people are opponents. not enemies.
 
K

Keith Thompson

Malcolm McLean said:
Keith's point that "the general consensus on clc is that the book is
no good, therefore it is no good" was false, and naturally I had to
reply to that, by giving the real explanation.
[...]

I don't recall saying that. But if the general consensus on clc is
that the book is no good, you should seriously consider the
possibility that the general consensus may have some truth behind it.
 
F

Flash Gordon

Malcolm McLean wrote, On 15/08/07 22:27:
No, I'm happy to have my works condemned.

You should not be. You should want to improve your works until they are
considered good.
Even with Ivor Rockbrain it's more a case of "this post is pure abuse,
I'd better not tolerate it" than anything personal.

Keith's point that "the general consensus on clc is that the book is no
good, therefore it is no good" was false, and naturally I had to reply
to that, by giving the real explanation.

That is what you think the reason is. I can't speak for anyone else on
this but it is not my reason for criticising it. My criticism is that
the code is poor as C code and even ignoring the C issues in at least
some cases is poor as examples of algorithms.

You should also note that the C code in "Numerical Recipes in C" has
been criticised here because it is also bad C code. IIRC on at least one
occasions the original version was complemented in the same post because
the Fortran code was good Fortran code. On the other hand, there have
been at least a couple of books by regulars that by regulars past and
present are generally well received. So criticism is not based on
whether someone is a poster here.

If you at least added an errata with the things you accept could have
been better that would help.

Alternatively, you could do all your example in your MiniBasic, then as
it is your language no one could complain that the examples were using
the language badly.
But I don't blame people for
acting as they do. Kenny McCormack is essentially right when he sees
most of the bandwidth on clc as dominance games, but where he is wrong
is in imagining that life could be any different.

Life is a game, and the vast majority of people are opponents. not enemies.

If you take the attitude that most people are against you then your
treating them as opponents is likely to make them react badly to you.
If, on the other hand, you worked on the assumption that criticism was
of the material and intended to be helpful, rather than of you, then you
would find that peoples reactions to you would be far better.
 
B

Ben Bacarisse

Malcolm McLean said:
No, I'm happy to have my works condemned.

"Alas, to wear the mantle of Galileo it is not enough that you be
persecuted by an unkind establishment, you must also be right."
Bob Park

This new twist you have thrown in (that because two things you have
written have drawn criticism, there might be something biased about the
criticism) has conveniently side-tracked the thread.

Since you complained about what you see as "trivial" criticisms, I
raised a bigger issue which you have not commented on, so I'll ask
yet again: what use is your (non) queue implementation? How could it
do anything but baffle a student? If you are worried about c.l.c
topicality, why not re-post on comp.programming where the wider issues
can be discussed?
 
K

Kelsey Bjarnason

[snips]

No, I'm happy to have my works condemned.

Good. I'm creating a post in alt.atheism, titled "Malcom McLean", where
I'm responding to your "12 arguments" book.
 
R

Richard Heathfield

Kelsey Bjarnason said:
[snips]

No, I'm happy to have my works condemned.

Good. I'm creating a post in alt.atheism, titled "Malcom McLean",
where I'm responding to your "12 arguments" book.

Is there still time to change it to "Malcolm McLean"?

Not that I plan on joining the thread. To do so, I'd have to read the
book, and I don't have sufficient motivation to do that, alas. But
getting someone's name right is a matter of courtesy, is it not?
 
C

CBFalconer

Malcolm said:
No, I'm happy to have my works condemned. Even with Ivor Rockbrain
it's more a case of "this post is pure abuse, I'd better not
tolerate it" than anything personal.

Keith's point that "the general consensus on clc is that the book
is no good, therefore it is no good" was false, and naturally I had
to reply to that, by giving the real explanation. But I don't blame
people for acting as they do. Kenny McCormack is essentially right
when he sees most of the bandwidth on clc as dominance games, but
where he is wrong is in imagining that life could be any different.

McCormack is a troll and a joke. Plonk it. Keith is generally
respected, and accurate, as is Eric Sosman.
 
K

Kelsey Bjarnason

Kelsey Bjarnason said:
[snips]

No, I'm happy to have my works condemned.

Good. I'm creating a post in alt.atheism, titled "Malcom McLean",
where I'm responding to your "12 arguments" book.

Is there still time to change it to "Malcolm McLean"?

My bad. Even worse, I did it there, too. Oh well - apologies to Malcolm
on that one.
Not that I plan on joining the thread. To do so, I'd have to read the
book, and I don't have sufficient motivation to do that, alas. But
getting someone's name right is a matter of courtesy, is it not?

So I can't type - compilers generally complain about mistyped identifiers. :)
 
F

Flash Gordon

Kelsey Bjarnason wrote, On 16/08/07 17:21:

So I can't type - compilers generally complain about mistyped identifiers. :)

Not if you mistype them consistently :)
 
K

Kelsey Bjarnason

Kelsey Bjarnason wrote, On 16/08/07 17:21:



Not if you mistype them consistently :)

If I could do that, I'd be happy. As it stands, I live in constant fear
that even the automated spell checkers around me will at some point rise
up against me for cruel and unusual punishment.

Well, not quite, but good goat, my non-formal writing does tend to contain
more than its fair share of typos.
 
M

Malcolm McLean

Ben Bacarisse said:
Since you complained about what you see as "trivial" criticisms, I
raised a bigger issue which you have not commented on, so I'll ask
yet again: what use is your (non) queue implementation? How could it
do anything but baffle a student? If you are worried about c.l.c
topicality, why not re-post on comp.programming where the wider issues
can be discussed?
I didn't complain about trivial criticisms. If there is typo that means a
closing parenthesis is a bracket, that's worth knowing and pointing out.
What I dispute is that several trivial criticisms of this sort add up to a
major criticism. I suppose there is a point at which they do, but I don't
think we're anywhere near there.

A lot of algorithm books make a big meal of the basic data structures. I
checked the Java queue classes before writing this post, and they were
pretty complicated. I takes two interfaces, the Collection and the Iterable,
and there are nine implementations - linked list queues, delay queues, and
so on. Then you've got add methods, which throw execeptions if the queue is
full, and offer methods which return an error code, so that you can decide
whether the queue filling up is or is not part of normal operation. It's
quite an impressive interface.

However the structures chapter is onyl chapter two. I want to describe the
simplicity of the structure instead of its complexity. I could have provided
something similar to the Java classes, using void pointers and access
functions. But it is unlikely that anyone would use such a function suite.
If you want a trivial queue you have a buffer, a circular buffer if you
don't want the inefficiency of shifting everything up on every remove
operation. Just as you build linked lists by incorporating a "next" pointer
in your structure, and trees built as you go. Whether this says something
good or somethign bad about C is a moot point. The fact is that no generic
basic data structure libraries ahve gained acceptance. If I was writing the
book in Java it would be different.

The whole chapter says "these are are basic data structures you will need,
here are some code fragments which show how to implement them". The reader
should be able to extend the expandable array to produce a dynamically
expanding queue, add functions to check for length and spare capacity, and
so on. An "empty" function is pretty patronising. The priority queue,
implemented efficiently, is the red black tree, but it is appropriate to
hold that back. Actually it is possible to speed up the deletion, I
believe - a web seach on "priority queue" and "efficiency" turns up a slew
of algorithms. The book is Basic Algorithms, covering a wide range of topics
in outline, not a massively detailed description of any one.
 
B

Ben Pfaff

Malcolm McLean said:
I could have provided something similar to the Java classes,
using void pointers and access functions. But it is unlikely
that anyone would use such a function suite. If you want a
trivial queue you have a buffer, a circular buffer if you don't
want the inefficiency of shifting everything up on every remove
operation.

A circular buffer can be encapsulated in C and still have some
benefits for code clarity, without reducing efficiency. At
least, that's my own evaluation of my own implementation of
circular queues in C:
http://cvs.savannah.gnu.org/viewvc/...oot=pspp&revision=1.5&content-type=text/plain
http://cvs.savannah.gnu.org/viewvc/...oot=pspp&revision=1.4&content-type=text/plain
Commentary is welcome of course.
The priority queue, implemented efficiently, is the red
black tree, but it is appropriate to hold that back.

You might have a look around for papers on evaluation of priority
queue performance. Red-black trees do pretty well if I recall
correctly, but their performance is not the best in every case.
Heaps are simpler and their performance is competitive for many
purposes. Here is my general-purpose heap implementation in C:
http://cvs.savannah.gnu.org/viewvc/...pspp/heap.h?root=pspp&content-type=text/plain
http://cvs.savannah.gnu.org/viewvc/pspp/src/libpspp/heap.c?root=pspp&view=log

(I am not sure that the code I link to above is
comp.lang.c-compliant, but it should not be too far from it.)
 
M

Malcolm McLean

Ben Pfaff said:
A circular buffer can be encapsulated in C and still have some
benefits for code clarity, without reducing efficiency. At
least, that's my own evaluation of my own implementation of
circular queues in C:

http://cvs.savannah.gnu.org/viewvc/...oot=pspp&revision=1.5&content-type=text/plain

http://cvs.savannah.gnu.org/viewvc/...oot=pspp&revision=1.4&content-type=text/plain
Commentary is welcome of course.
/* Initializes DEQUE as an empty deque of elements ELEM_SIZE
bytes in size, with an initial capacity of at least
CAPACITY. Returns the initial deque data array. */
void *
deque_init (struct deque *deque, size_t capacity, size_t elem_size)
{
void *data = NULL;
deque_init_null (deque);
if (capacity > 0)
{
deque->capacity = 1;
while (deque->capacity < capacity)
deque->capacity <<= 1;
data = xnmalloc (deque->capacity, elem_size);
}
return data;
}

You need to check that capacity is less than 1 << (size_t_bits -1) or the
code will go into an infinite loop.
It's not a practical problem, of course, but it is typical of the sorts of
bugs that do creep into code. I was going to launch into an anti-size_t
tirade and the way in which it gives false promises of security, but I'm too
tired and I've got to prepare a mighty defence of another of my books which
is under attack on another ng.
 
B

Ben Pfaff

while (deque->capacity < capacity)
deque->capacity <<= 1;
You need to check that capacity is less than 1 << (size_t_bits -1) or
the code will go into an infinite loop.

This is a good point. Thank you. However, there's no way to
solve the problem that doesn't violate the postcondition stated
in the function header, so I think I'll "solve" it by adding an
assertion.
 
C

CBFalconer

Malcolm said:
.... snip ...

/* Initializes DEQUE as an empty deque of elements ELEM_SIZE
bytes in size, with an initial capacity of at least
CAPACITY. Returns the initial deque data array. */
void *
deque_init (struct deque *deque, size_t capacity, size_t elem_size)
{
void *data = NULL;

deque_init_null (deque);
if (capacity > 0) {
deque->capacity = 1;
while (deque->capacity < capacity)
deque->capacity <<= 1;
data = xnmalloc (deque->capacity, elem_size);
}
return data;
}

Makes no sense, lacking descriptions of struct deque,
deque_init_null, xnmalloc.
 
J

James Dow Allen

Two bucks in one day? Hmmm. I'm not talking to
you any more without my lawyer.

But I thought you had a huge supply of rounding-error
halfpennies from your days hacking bank computers :)
But seriously, there are also advantages in making the table size an
integer power of 2 (which, in all but one case, fails the primality
test with a vengeance), since the hash value can then be reduced to the
table size more rapidly.

Gak! You're not coding ala Falconer are you?
(Doing a division on every reprobe, instead
of a simple add-compare-subtract.) Most hashers
will "want to" divide by a prime for scrambling
anyway, so the index reduction is free.

(BTW, when memory is at a premium one may wish
to expand tables, when required, by less than 100%,
precluding any guarantee of power-of-two.)

Speaking of Falconer, his method (linear probing
with second hash value used for probe increment)
*requires* (even more so than quadratic probing,
see upthread) prime table size.

In my posting I mentioned "interesting" recent
methods like Cuckoo and Compact Hash and got
little response. Is this because the
"Google Groups" kill-files me? (Or, because
people have taken to kill-filing
"James Dow Allen" specifically?)
I'm not talking to
you any more without my lawyer.

Well, have a nice day anyway!
James
 
U

user923005

But I thought you had a huge supply of rounding-error
halfpennies from your days hacking bank computers :)


Gak! You're not coding ala Falconer are you?
(Doing a division on every reprobe, instead
of a simple add-compare-subtract.) Most hashers
will "want to" divide by a prime for scrambling
anyway, so the index reduction is free.

When table size is an integral power of 2, the address can be
calculated from the hash with a single AND operation.
(BTW, when memory is at a premium one may wish
to expand tables, when required, by less than 100%,
precluding any guarantee of power-of-two.)

Speaking of Falconer, his method (linear probing
with second hash value used for probe increment)
*requires* (even more so than quadratic probing,
see upthread) prime table size.

In my posting I mentioned "interesting" recent
methods like Cuckoo and Compact Hash and got
little response. Is this because the
"Google Groups" kill-files me? (Or, because
people have taken to kill-filing
"James Dow Allen" specifically?)

I mentioned cuckoo hash also.
This one is easy to understand:
http://www.mpi-inf.mpg.de/~sanders/programs/cuckoo/
It is written in C++ but would be trivial to rewrite in C.

I do not know of any useful implementation of compact hashing that has
a license that would allow me to use it, and I have not read the paper
on compact hashing yet.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top