How to implement a Hash Table in C

Ben Bacarisse · Aug 18, 2007

Richard Heathfield said:
Ben Bacarisse said:

I pointed out the name clash with <stdio.h>'s remove() some days - or
possibly even weeks - ago. That should have been plenty of time to
correct the electronic version of the book (given that MM seems to have
plenty of time to discuss it, so I presume he has enough time to fix it
too - the more important task).

So I went to have a look, to see if it had been fixed yet. (This is the
first time I've looked at the chapter.)

MM's A123 XYZ / B123 CDE example is wrong, in that it suggests checking
Slot 134. If that's empty, MM says, B123 CDE goes goes in Slot 124. I
can see no justification whatsoever for checking slot (S + 10) before
deciding whether to use Slot S.

I did not spot that. There are lots of typos. I get a feeling (from
the density of such errors) that details no not matter to the author.

I was puzzled to find that hash table entries were not abstracted as the
table itself was. "Divide and conquer" is too useful a strategy to be
ignored lightly, especially in teaching.

There is very little "abstracting out". This is one thing (OT for clc
so I have not pressed it much) that I find bad about the book
(particularly so for learners). The stack (in Ch2) has no operations
at all. I would not do this, even when the code is as simple as it is
for a stack, but when the code gets non-trivial, as is the case for
testing if a circular buffer is full or empty[1], it is simply not on
-- unless explicitly set as an exercise, of course.

Oddly, MM seems to be using int rather than size_t for recording lengths
and sizes (neither of which can be negative). Presumably this is a
typo, since no experienced C programmer would deliberately be that
daft.

It is odd, yes, but you know why he does it -- size_t, it seems,
infect and spreads through programs. To my mind, it spreads
documentation through the program about which objects can hold which
sets of values.

He has a point in one respect, that it precludes certain traditional
failure indications. I don't favour using negative returns in that
way but it is often done. However, size_t is there for a reason and
it does a good job. To not use it for that purpose because it
precludes a coding trick one likes is being stubborn to the point of
perversity.

MM seems to be thinking that "quadratic probing" means "square the hash
value" (and that's a direct quote). Either he's misunderstood quadratic
probing completely, or I have. As far as I'm concerned, quadratic
probing has nothing whatsoever to do with the hash value except as a
place to start looking for a slot. My understanding is that, if the
hash value h has so far resulted in n collisions in a table of size s,
then the place to look next is slot (h + n*n) % s - and most certainly
NOT (h * h) % s!

Yes, he does. I missed that. Quadratic probing is usually explained
as inspecting

h(k, i) = h(k) + c1 * i + c2 * i * i

for i = 0, 1 and so on. When c2 == 0 you have linear probing. I have
never seen the term used in the way MM uses it. Squaring the hash
looks like a bad idea, at least on the surface. I wonder if he has
any data about the advantages and disadvantages of doing so.

[1] I know it is simple, but it is not trivial. It used to catch out
more than 50% of my students every year.

Malcolm McLean · Aug 18, 2007

Ben Bacarisse said:
There is very little "abstracting out". This is one thing (OT for clc
so I have not pressed it much) that I find bad about the book
(particularly so for learners). The stack (in Ch2) has no operations
at all. I would not do this, even when the code is as simple as it is
for a stack, but when the code gets non-trivial, as is the case for
testing if a circular buffer is full or empty[1], it is simply not on
-- unless explicitly set as an exercise, of course.

That is actually a non-trival (trivial doesn't mean "not important", it
means something which is just a slip or a minor style issue) criticism.
There should be a discussion about the end and start indicators merging as
the queue fills up, I'd agree.

That is a bit more useful than "the code will not compile under C99 and not
link under C89, therefore the book is shoddily written", because prototypes
aren't included, sort of comment I get most of the time.

It is odd, yes, but you know why he does it -- size_t, it seems,
infect and spreads through programs. To my mind, it spreads
documentation through the program about which objects can hold which
sets of values.

He has a point in one respect, that it precludes certain traditional
failure indications. I don't favour using negative returns in that
way but it is often done. However, size_t is there for a reason and
it does a good job. To not use it for that purpose because it
precludes a coding trick one likes is being stubborn to the point of
perversity.

That's only part of my criticism. It also makes it easier to protect from
garbage. If we check (N >= 0) and N is a random garbage number, then the
check will trigger 50% of the time. If we call N with more than two or three
such garbage values the check is sure to trigger, which is less misleading
than "out of memory".
The main reason however is that size_t sizes are not good unless you also
have size_t indices. I don't like all those underscores. There are way too
many integer types in C, and they are spreading. It's time to fight back and
start making the language smaller rather than bigger.

Yes, he does. I missed that. Quadratic probing is usually explained
as inspecting

h(k, i) = h(k) + c1 * i + c2 * i * i

for i = 0, 1 and so on. When c2 == 0 you have linear probing. I have
never seen the term used in the way MM uses it. Squaring the hash
looks like a bad idea, at least on the surface. I wonder if he has
any data about the advantages and disadvantages of doing so.

I've misunderstood the algorithm and invented something better. (I think).
I'm using the square of the hash value modulus table size as an increment.
So effectively it's secondary hashing and not quadratic probing at all.

[1] I know it is simple, but it is not trivial. It used to catch out
more than 50% of my students every year.

Anyway, the original thread has died down. So it's time to summarise.

Ben Bacarisse · Aug 18, 2007

It is a good description. The writing is rather dry and my car park
metaphor is more interesting. Also I describe how to do deletions with
linear probing. However it is a vey good explanation of hashing, it
would be wrong to say it is not.

The code works on global variables. Whilst this is acceptable for
demonstration code, it also tends to encourage bad programming
habits.

How are we to read this? You say it is acceptable for demonstration
code (which this clearly is) but you seem to want to tack on a
criticism, just in case. You can't have it both ways. You do this for
your stack and your queue.

However you can tell by the tone that mostly these are used as
a pretext for envy.

If the cap fits...

Richard Heathfield · Aug 18, 2007

Malcolm McLean said (in reply to me):

Corrrections are being stored up for an edition 2, which will be
released shortly.

I hope you will forgive us for continuing to breathe in the interim.

[no comment from MM]

You appear to have ignored this item.

There are two tables. One stores data items of any size, one stores
pointers.

Okay, fair point - let's try that one again, editing as appropriate:

I was puzzled to find that hash table entries were not abstracted as the
tables themselves were. "Divide and conquer" is too useful a strategy
to be ignored lightly, especially in teaching.

It's an algorithms book. The algorithms work on integers.

I think you'll find that hashing algorithms generally work on unsigned
integers.

The hash
function then has to to return a size_t, which is confusing to people
using the book to understand how to port to another language.

Fine, so use unsigned long int instead. Anyone can work out what that
means, and it's cross-portable between C90 and C99.

Too hard to read.

No, it isn't.

malloc( sizeof *ptr) is a bad clc ism

No, it's a good clc-ism.

that you will seldom see in another environment.

Right. As Sturgeon says, at least 90% of everything is crud, so you'd
expect to find malloc(sizeof *ptr) in only 10% of cases (or fewer).
That doesn't mean it's bad. It just means most people haven't come
across it yet.

The reader has to perform a mental dereference which makes it harder
to see if the size of correct.

If the reader has to perform a mental dereference, he (or she) doesn't
understand sizeof. No dereferencing occurs, because sizeof does not
evaluate its argument. Furthermore, the size is *bound* to be correct
for any object pointer type.

My convention is to have an opaque structure, and a constructor with
the name of the structure in lower case. This is analagous to C++. As
you say, create_xxx would have the advantage of being a verb.

Better: xxx_create or xxx_Create or something like that.

Read the actual book - preview is available.

I was reading the preview (presumably). Previews allow one to assess the
quality of the book prior to deciding whether to purchase it. I have no
intention of paying money for a book whose preview is of such low
quality.

The html is dumped form
the Open Office book files, unfortunately it doesn't understand code
formatting. I really ought to be able to find a way around that, I
know.

Yes, you ought to, really. I do sympathise about formatting issues -
deciding on a file format in the first place is no easy decision, since
at that stage it's quite probable that you don't know how you're going
to market the book, and you don't know whether other companies will be
involved - will they demand Word format? LaTeX? HTML? Impossible to
guess.

Nevertheless, if I can produce properly formatted code in HTML with
ease, I'm sure you can too, if you try. (And yes, I can.)

strdup() is discussed in chapter 1.

So the reference in the hash table chapter is to the strdup discussed in
chapter 1? Okay, fine - but *why*? After all, as I have already pointed
out above, the hash table code doesn't call strdup. It calls mystrdup.
These are different names.

typo.

Fixed yet?

I'll have to check on this one.

You're supposed to check on the basics of how basic algorithms work
*before* attempting to write a book about them. You have done your best
to deflect many criticisms about the quality of the code and the typos
and design decisions and so on, by saying that that's not what the book
is about - it's about explaining basic algorithms. Well, *this*
criticism drives a coach and horses through your strategy, because it
shows that you don't actually understand the algorithm you're trying to
teach.

You seem determined to pick faults.

If I were determined to pick faults, I would have gone over the chapter
with a fine-toothed comb. No, Malcolm - these faults jumped off the
screen and spat in my eye.

The problem it then becomes
distorting, like Ben Bacarisse's comments. It is then impossible to
know whether the criticisms actually represent anything legitimate or
are just fussing.

If a criticism that you don't understand the very algorithm you are
explaining is dismissed as "just fussing", then I see no hope for you.
A technical author must not just expect criticism, but be able to
analyse it, determine whether the criticism is proper (and as far as I
can tell, most and quite possibly all of the criticisms levelled at the
chapter are in fact proper and valid), and - wherever it *is* proper -
be prepared to take reasonable and prompt corrective action.

abstracted - is an entry of arbitrary size abstract enough? The keys,
yes, I admit it might have been better to go for non-strings.

I lack the perseverance to explain this to you. Let's see whether you
bother to correct any of the myriad more serious problems first.

size_t - yes, there is an obvious case, but you don't seem to have
considered the implications.

The obvious case is so powerful that the implications would have to be
drastic - and they aren't. You have tried hard to explain them to
several experts here, and convinced none of them. It may not have
occurred to you, but your inability to convince them might just be a
consequence of your arguments being wafer-thin.

malloc(size *ptr) - that's not the convention in most places. The
argument for it is based on the needs of the compiler rather than the
human reader.

On the contrary. The compiler doesn't give a damn how much you malloc.
The clc-conforming style is based on the needs of the programmer, for
whom it gives a simple and easy-to-use pattern.

verbs for function names. There is a case, but it is not obvious that
you are right. C++ doesn't agree.

C++ is irrelevant in comp.lang.c, and I await with interest your sound
logical reasons for giving a function a noun name.

indentations. Yes, it's my fault for not getting a decent HTML dumper.
If I edit the html

strdup - discussed in chapter 1.

menas - unambiguously valid criticism.

And the least important of many valid criticisms.

quadratic probing - you might be right here.

YOU SHOULD KNOW! Otherwise you have no business taking money off people
in return for teaching them how quadratic probing works! Okay, some
mistakes are inevitable - typos and stuff, even brainos - they happen.
But this is fundamental!

prime stuff - yes, that needs a bit of fixing, I will agree.

Indeed it does.

So actually most of the criticism dissolves.

No, it doesn't. You can't dissolve criticism by adding acid. You have
three choices:

1) pretend the criticisms are invalid - which fools nobody but yourself.
2) accept that they are valid, but do nothing to fix them.
3) accept that they are valid, and fix them.
4) show that they are not valid.

4) is best, but you haven't managed it here. 3) is next best, and
certainly good enough; 2) is not good, but at least it's honest; 1) is
a fake and a farce, and it seems to be the option you're choosing.

I am not saying that
there are not things you can say against the book, or that it couldn't
have been done better.

Just as well, since I've already said them, and it could. And should.

However can you point to a better introductory
web page on hash tables? That's the real test.

I think I'd struggle to find a worse one.

Richard Heathfield · Aug 18, 2007

Malcolm McLean said:

I don't see how else I could react.

By fixing the chapter.

Criticisms have been made, some of
them indisuptable bug reports, some of them very good points, some of
them defensible but not really good criticisms, some criticisms which
fail to understand the purpose of the code or the book, and some which
are objectively false.

I think you have failed to understand the purpose of the criticisms.

That's what you would expect. However there is also a sort of
assuption that the book is no good.

It seems to be a reasonable assumption, based on what I've seen of the
book.

In fact I found a bug in Ben
Pfaff's circular buffer code within a couple of minutes, but no one
was demanding that the code be taken down.

Did he agree it was a bug? If not, did he explain why? If he did agree,
did he fix the problem and publish the new code in place of the old
code? These are acceptable reactions to bug reports.

It is a good description. The writing is rather dry and my car park
metaphor is more interesting. Also I describe how to do deletions with
linear probing. However it is a vey good explanation of hashing, it
would be wrong to say it is not.

No, it isn't a good explanation, and no, it isn't wrong to say so. That
isn't how people park cars, for one thing. For another, you don't
understand the algorithm you're trying to explain.

The code works on global variables. Whilst this is acceptable for
demonstration code,

No, it isn't...

it also tends to encourage bad programming habits.

....and that's why.

There are always problems. As you say, it depends how you react to
them. There are errors and weaknesses in Basic Algorithms, of course.
However you can tell by the tone that mostly these are used as a
pretext for envy.

Pfui. What, precisely, are we supposed to be envying? The broken code,
the broken design, or the broken explanation?

Ben Bacarisse · Aug 18, 2007

Richard Heathfield said:
(and as far as I
can tell, most and quite possibly all of the criticisms levelled at the
chapter are in fact proper and valid)

In the interest of scrupulous honesty, I made one erroneous
criticism -- which I have acknowledged as such. The hash table
*relies* on the fixed allocator running out of memory (OK, that then
threw up another bug, but that is beside the point). I did not spot
this linkage, and complained about what might happen as the table
fills up. The table can't fill up. Instead, the allocator returns
NULL but, unfortunately, dereferences it first.

Ben Bacarisse · Aug 18, 2007

Malcolm McLean said:
Ben Bacarisse said:

There is very little "abstracting out". This is one thing (OT for clc
so I have not pressed it much) that I find bad about the book
(particularly so for learners). The stack (in Ch2) has no operations
at all. I would not do this, even when the code is as simple as it is
for a stack, but when the code gets non-trivial, as is the case for
testing if a circular buffer is full or empty[1], it is simply not on
-- unless explicitly set as an exercise, of course.

Click to expand...

That is actually a non-trival (trivial doesn't mean "not important",
it means something which is just a slip or a minor style issue)
criticism. There should be a discussion about the end and start
indicators merging as the queue fills up, I'd agree.
Ah!

It is odd, yes, but you know why he does it -- size_t, it seems,
infect and spreads through programs. To my mind, it spreads
documentation through the program about which objects can hold which
sets of values.

He has a point in one respect, that it precludes certain traditional
failure indications. I don't favour using negative returns in that
way but it is often done. However, size_t is there for a reason and
it does a good job. To not use it for that purpose because it
precludes a coding trick one likes is being stubborn to the point of
perversity.

Click to expand...

That's only part of my criticism. It also makes it easier to protect
from garbage. If we check (N >= 0) and N is a random garbage number,
then the check will trigger 50% of the time. If we call N with more
than two or three such garbage values the check is sure to trigger,
which is less misleading than "out of memory".
The main reason however is that size_t sizes are not good unless you
also have size_t indices. I don't like all those underscores.

Well, we disagree. I don't think either of us will change our opinion
as a result of a usenet debate with the other.

I've misunderstood the algorithm and invented something better. (I
think).

You have a higher opinion of your mistakes than I do of mine.

I'm using the square of the hash value modulus table size as
an increment. So effectively it's secondary hashing and not quadratic
probing at all.

No. It is just a bad implementation of linear probing[1]. You reduce
the actual hash value too early so it looks to me like the sequence of
slots you look at are that same for any two keys that hash to the same
slot initially. This looks like a serious error. Have I
miss-understood the code?

Good linear probing strategies either use extra bits of the full
(unreduced) hash or they keep it to add to the linear increment, thus
generating distinct sequences for keys that initially map to the same
location.

[1] The definition is not clear to me. I use the term for all probing
sequences: h(k, i) = h(k) * c * i. Some authors limit it to the case
when c == 1.

Keith Thompson · Aug 18, 2007

Malcolm McLean said:
That is a bit more useful than "the code will not compile under C99
and not link under C89, therefore the book is shoddily written",
because prototypes aren't included, sort of comment I get most of the
time.

[...]

Omitting required '#include' directives is one of the most common
errors made by inexperienced C programmers. Reinforcing this error is
not a good idea. If you present code that depends on a predefined
header, you should present the required '#include' directive as well.

websnarf · Aug 18, 2007

James said:
James said:

[...]
Speaking of Falconer, his method (linear probing
with second hash value used for probe increment)
*requires* (even more so than quadratic probing,
see upthread) prime table size.

Click to expand...

Well, not exactly: It requires only that the increment
between probes be relatively prime to the table size. A prime
table size makes this criterion easy to satisfy, but it's not
the only way. A table size equal to a power of two, for example,
works just fine if the probe increment is odd. (Degenerate
special case: pure linear probing with an increment of unity
works with any table size at all, prime or composite.)

Some people seem very interested in the relative speed
of AND vs MOD for this purpose, and on most machines AND
is faster or at the very least no slower.

This would include anyone who actually believes in mantras such as
"measure measure measure". I'm not aware of any architecture for
which this difference is speed is not at least 10 times in favor of
AND.

Oh and if there are any down sides to using power of 2 tables I am
unaware of them. Using prime sized tables ... it just looks like a
wrong decision on its face.

[...] I think, though,
that the time savings must be viewed as just one component
of the search cost -- another piece (sometimes rather large)
is computing the key's hash value in the first place, not
to mention the cost of equality comparisons on the table
items encountered along the way.

Its not the largest cost, but my timings of it shows that for certain
simplistic scenarios (using a hash table to do word counts for a text
word sequence) it has a definite measurable secondary effect.

(Slight digression): It's often suggested that those
equality comparisons can be made cheaper by storing hash
codes in the table entries along with the payload. Before
making a possibly expensive strcmp() or whatever, one can
first do a (fast) arithmetic comparison of the hashes: if
the hashes don't match, the objects mush be unequal and
it is unnecessary to make the expensive comparison. But this
stratagem doesn't seem to me to be a "Well, doh!" proposition;
at least two counter-arguments exist:

- If the table stores pointers to items rather than item
instances, storing hash values in the table roughly
doubles the amount of memory required. If the same
amount of memory were used for a pointers-only table,
the number of table slots would double and the load
factor would be halved. This should lead to shorter
searches -- and it is not at all clear _a priori_ that
more fast probes will outrace fewer slow probes.

Ridiculous. If you have n items, then its the equivalent of
increasing your memory consumption by n pointers (or perhaps ~ 2.5*n
if you want to harp on the typical worst case of average scenario
consumption). But you already know that you have memory for n items
in the first place. This complaint is only valid if you think adding
an extra pointer per item is an unacceptable memory cost. This is
unlikely to be the case for most types of items that you would care to
store in a hash table.

- Storing the hash codes saves time whenever a probe hits
a slot occupied by the "wrong" key (it saves no time when
the probe finds the "right" key). If the time savings is
significant, it must mean that the key comparisons are
very expensive or that the probe sequences are very long.
If we've taken care to construct good hash functions and
have reason to believe that searches take two-point-mumble
probes, on the average, the savings is only one-point-mumble
comparisons per search.

For medium complexity items (such as a string), a properly implemented
hash table has absolutely no perceivable overhead on top of the hash
computation and comparison functions.

So if you go by a table doubling resizing strategy, and you tend not
to delete a lot, then on at least some scenarios I found the average
seek length to be about 1.4 slots per item. By not having a stored
hash value, you simply make 40% more comparison calls. This is a huge
cost for any well designed hash table, of non-trivial items.

As always when one is interested in speed, the gold standard
is the familiar "Measure, measure, measure!"

This comment is so ironic coming from you. All my claims are
empirically backed, yet you are just repeating these same false claims
you've always made, which of course, you never would if you ever
bothered to measure any of this stuff for yourself.

Keith Thompson · Aug 18, 2007

Malcolm McLean said:
There are always problems. As you say, it depends how you react to them.
There are errors and weaknesses in Basic Algorithms, of
course. However you can tell by the tone that mostly these are used as
a pretext for envy.

Envy?? That's completely delusional. Nobody here is envious of you
or your book. Why on Earth would we be?

Eric Sosman · Aug 18, 2007

This comment is so ironic coming from you. All my claims are
empirically backed, yet you are just repeating these same false claims
you've always made, which of course, you never would if you ever
bothered to measure any of this stuff for yourself.

Thank you for informing me what I do and don't measure.
I'm always glad to hear from someone who knows more about
my activities than I do myself.

Malcolm McLean · Aug 18, 2007

Keith Thompson said:
Malcolm McLean said:

That is a bit more useful than "the code will not compile under C99
and not link under C89, therefore the book is shoddily written",
because prototypes aren't included, sort of comment I get most of the
time.

Click to expand...

[...]

Omitting required '#include' directives is one of the most common
errors made by inexperienced C programmers. Reinforcing this error is
not a good idea. If you present code that depends on a predefined
header, you should present the required '#include' directive as well.

The problem is the code is interspersed with the text. It might be an idea
to #include everythign I use in a chapter right at the begining, however.

Malcolm McLean · Aug 18, 2007

Keith Thompson said:
Envy?? That's completely delusional. Nobody here is envious of you
or your book. Why on Earth would we be?

I don't include you in that.

CBFalconer · Aug 18, 2007

CBFalconer said:
CBFalconer said:

Richard Harter wrote:

Click to expand...

.... snip ...

[...] that always using the same probe increment in a
power of two table size also has bad properties. His
solution is to use the unused bits of the full hash to
select an increment size from a table of primes.

Click to expand...

That is dangerous. The purpose of the multiple hash is to
have different search patterns for different objects that
have the same primary hash pattern.

Click to expand...

No, its to have a different search pattern for different object
that have the same primary hash *INDEX* (i.e., post masking).
This is obvious since, for any good hash function, index
collisions will be many many times more common than full hash
function collisions.

Mathematically speaking, on average a single 32-bit (good) hash
function collision will occur roughly when 65536 entries appear,
while a hash index collision will occur roughly when the square
root of the table size number of entries occur (so for a 9
million entry table, that's just 3000 entries).

No, you are ignoring the size of the hash table itself. For
example, the starting size for hashlib is 17, so no more than 17
hash values (modulo 17) are useful. This changes as the table
enlarges itself.

You are ignoring the fact that skip values of 2, 4 and 6, for
example, for indexes starting on even offsets near each other
will tend to collide with each other as the table fills up.

Same skip-value, and commensurate offset collisions must be
minimized by increasing the chance they miss each other. This
happens most easily if the skip values are predominantly co-prime,
which is easiest to construct when they are simply taken from a
list of prime numbers.

Again, you are ignoring the effect of varying table size.

CBFalconer · Aug 18, 2007

Malcolm said:
.... snip ...

Too hard to read. malloc( sizeof *ptr) is a bad clc ism that you
will seldom see in another environment. The reader has to perform
a mental dereference which makes it harder to see if the size of
correct.

Nonsense. "ptr = malloc(sizeof *ptr);" is automatically the right
size. No checking of any form is required.

Chris Dollin · Aug 18, 2007

This would include anyone who actually believes in mantras such as
"measure measure measure". I'm not aware of any architecture for
which this difference is speed is not at least 10 times in favor of
AND.

The question is not whether AND is faster than MOD; it's whether
hash-tables implemented with AND as the reduction method are faster
than broadly-similar hashtables using MOD.

If I recall correctly, MOD is said to be better than AND because
more of the bits of the hash contribute to the index. If the hash
function is really good, this won't matter -- but we don't always
get to use "really good" hash functions.

I'm not saying that tables-using-AND are no faster than
tables-using-MOD; I'm saying that "AND is faster than MOD" doesn't
justify tuA faster than tuM.

CBFalconer · Aug 18, 2007

Flash said:
Malcolm McLean wrote, On 18/08/07 10:58:
.... snip ...

For which you have not published an errata so that your audience
know about them.

For contrast, look at my hashlib. It was published complete with
the test suite and test programs that evaluated it during and after
development. I have had two and a half bug reports since the
initial publication, and no more for years.

Keith Thompson · Aug 18, 2007

Malcolm McLean said:
Keith Thompson said:

Malcolm McLean said:

That is a bit more useful than "the code will not compile under C99
and not link under C89, therefore the book is shoddily written",
because prototypes aren't included, sort of comment I get most of the
time.

Click to expand...

[...]

Omitting required '#include' directives is one of the most common
errors made by inexperienced C programmers. Reinforcing this error is
not a good idea. If you present code that depends on a predefined
header, you should present the required '#include' directive as well.

Click to expand...

The problem is the code is interspersed with the text. It might be an
idea to #include everythign I use in a chapter right at the begining,
however.

Or you show the required '#include' directives at the top of each code
sample. For example:

#include <stdio.h>
...
void hello(void)
{
printf("Hello, world\n");
}

where the "..." indicates that the directive needs to be there, but it
doesn't necessarily immediately precede the function definition. You
could explain this convention in the text.

Adding 2 or 3 lines to each code sample doesn't seem excessive.

Keith Thompson · Aug 18, 2007

Malcolm McLean said:
I don't include you in that.

That's a good start. Don't include anyone else, and you'll be
approaching the truth.

Malcolm McLean · Aug 18, 2007

Keith Thompson said:
Malcolm McLean said:

Keith Thompson said:

[...]
That is a bit more useful than "the code will not compile under C99
and not link under C89, therefore the book is shoddily written",
because prototypes aren't included, sort of comment I get most of the
time.
[...]

Omitting required '#include' directives is one of the most common
errors made by inexperienced C programmers. Reinforcing this error is
not a good idea. If you present code that depends on a predefined
header, you should present the required '#include' directive as well.

Click to expand...

The problem is the code is interspersed with the text. It might be an
idea to #include everythign I use in a chapter right at the begining,
however.

Click to expand...

Or you show the required '#include' directives at the top of each code
sample. For example:

#include <stdio.h>
...
void hello(void)
{
printf("Hello, world\n");
}

where the "..." indicates that the directive needs to be there, but it
doesn't necessarily immediately precede the function definition. You
could explain this convention in the text.

Adding 2 or 3 lines to each code sample doesn't seem excessive.

Actually chapter 2 is untypical. Most present a module of related functions.
So I could present a header for for each chapter.
What I am reluctant to do is to present too much of the code as "finished".
It is not meant to be a library with documentation, although some of the
chapters get that way. It's meant to show how things work, to demystify the
functions that people call in everyday programming, or applications they
use.

How to implement a html parser in java?	1	Dec 28, 2023
hash table - simple implementation	16	Oct 17, 2012
Database Manager: A C++ Console Application	14	May 12, 2025
How can I find out the id of the first record in a database table, assuming record 1 is deleted?	2	Aug 27, 2025
Unable to read input from keyboard, in below C code, for a BST.	0	Jul 20, 2025
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
Become a C++ programmer	6	Feb 21, 2023
How to get started with C programming:	1	Aug 7, 2023

How to implement a Hash Table in C

Ben Bacarisse

Malcolm McLean

Ben Bacarisse

Richard Heathfield

Richard Heathfield

Ben Bacarisse

Ben Bacarisse

Keith Thompson

websnarf

Keith Thompson

Eric Sosman

Malcolm McLean

Malcolm McLean

CBFalconer

CBFalconer

Chris Dollin

CBFalconer

Keith Thompson

Keith Thompson

Malcolm McLean

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads