How to implement a Hash Table in C

Kelsey Bjarnason · Aug 19, 2007

Malcolm McLean said:

Does anyone else here apart from Malcolm have difficulty reading it?

This is one of those things which, like migrating to x++ from x = x + 1,
simply takes getting used to. It is idiomatic in what one might term
"good C code", not to mention around here, but a rank newbie - i.e.
someone who should not be reading MM's book in the first place - might not
grok it immediately.

Once grokked, however, it is a perfectly sensible way to do the job, for
the usual reasons.

Kelsey Bjarnason · Aug 19, 2007

The objection to

ptr = malloc(sizeof *ptr);

is purely syntactical.

It is? Why? Is there a syntax error in the code?

Keith Thompson · Aug 19, 2007

Malcolm McLean said:
The objection to

ptr = malloc(sizeof *ptr);

is purely syntactical. As Keith said, if sizeof were spelt $ and given
normal operator rules for whitespace

ptr = malloc($*ptr);

would be OK. The * wouldn't look like multiplication of an identifier.

As I said, if you really think it's such a problem, you *can* put
partentheses around the operand:

ptr = malloc(sizeof(*ptr));

Do you have any objection, "syntactical" or otherwise, to that form?

Malcolm McLean · Aug 19, 2007

Keith Thompson said:
As I said, if you really think it's such a problem, you *can* put
partentheses around the operand:

ptr = malloc(sizeof(*ptr));

Do you have any objection, "syntactical" or otherwise, to that form?

I prefer it. It is not uncommon to malloc more than one data item.

so ptr = malloc( N * sizeof *ptr);

if less clear than
ptr = malloc(N * sizeof(*ptr));

unless you are one of those people who likes to put the test in the line

if( ptr = malloc(N * sizeof(*ptr)) );

then some compilers like

if( (ptr = malloc(N * sizeof(*ptr))) )

at which point we've broken the rule of three and lost readbility.

Ed Jensen · Aug 19, 2007

Malcolm McLean said:
I prefer it. It is not uncommon to malloc more than one data item.

so ptr = malloc( N * sizeof *ptr);

if less clear than
ptr = malloc(N * sizeof(*ptr));

unless you are one of those people who likes to put the test in the line

if( ptr = malloc(N * sizeof(*ptr)) );

then some compilers like

if( (ptr = malloc(N * sizeof(*ptr))) )

at which point we've broken the rule of three and lost readbility.

There's always:

if (ptr = malloc(N * sizeof *ptr), ptr == NULL) ...

Keith Thompson · Aug 19, 2007

Ed Jensen said:
There's always:

if (ptr = malloc(N * sizeof *ptr), ptr == NULL) ...

Or just:

if ((ptr = malloc(N * sizeof *ptr) != NULL) ...

I'd probably just put the test on a separate line.

Ed Jensen · Aug 19, 2007

Keith Thompson said:
Or just:

if ((ptr = malloc(N * sizeof *ptr) != NULL) ...

I'd probably just put the test on a separate line.

Good idea, since your one liner would fail to compile.

Eric Sosman · Aug 19, 2007

Richard said:
[...[
This is one of the reasons
twin primes are popular as table sizes: you use the hash code
mod N for the first probe, and then if necessary you use the
hash mod (N-2) plus 1 as the increment for further probes. The
primality of N guarantees that the increment is relatively prime
to it; the primality of N-2 helps scramble groups of keys that
have simple relationships (sum1, sum2, sum3, ...).

Click to expand...

I'm not clear as to why the other sibling in a pair twin prime
should be superior for scrambling as compared to using some other
prime.

Any prime will scramble about as well as any other (I'm
speaking very loosely here, of course), but a larger prime
will yield more different remainders than a smaller one --
for example, consider K % 65537 vs. K % 2.

pete · Aug 20, 2007

Keith said:
Or just:

if ((ptr = malloc(N * sizeof *ptr) != NULL) ...

I'd probably just put the test on a separate line.

I recall Eric Sosman advising me that in the case of loops,
it's probably better to write something like

while((ptr = malloc(N * sizeof *ptr) != NULL) {
/**/
}

than it is to write

ptr = malloc(N * sizeof *ptr);
while(ptr != NULL) {
/**/
ptr = malloc(N * sizeof *ptr);
}

because it makes it more obvious that there is only one expression
generating values for ptr.

Eric Sosman · Aug 20, 2007

pete said:
I recall Eric Sosman advising me that in the case of loops,
it's probably better to write something like

while((ptr = malloc(N * sizeof *ptr) != NULL) {
/**/
}

I hope I didn't advise exactly that! (Hint: count
the parentheses.

than it is to write

ptr = malloc(N * sizeof *ptr);
while(ptr != NULL) {
/**/
ptr = malloc(N * sizeof *ptr);
}

because it makes it more obvious that there is only one expression
generating values for ptr.

General principle: If something is written in only one
place, that one place is authoritative -- it may be right or
it may be wrong, but it can be judged and possibly corrected
on its own. If something is written in two places, neither
is authoritative and there can be a question about which is
right, and if both are wrong but only one gets corrected, ...

Or, as an older adage has it: "A man with two watches
never knows the correct time."

pete · Aug 20, 2007

Eric said:
I hope I didn't advise exactly that! (Hint: count
the parentheses.

General principle: If something is written in only one
place, that one place is authoritative -- it may be right or
it may be wrong, but it can be judged and possibly corrected
on its own. If something is written in two places, neither
is authoritative and there can be a question about which is
right, and if both are wrong but only one gets corrected, ...

Or, as an older adage has it: "A man with two watches
never knows the correct time."

Thank you.

Keith Thompson · Aug 20, 2007

Ed Jensen said:
Good idea, since your one liner would fail to compile.

D'oh! Thanks for the correction:

if ((ptr = malloc(N * sizeof *ptr)) != NULL) ...

Chris Dollin · Aug 20, 2007

Well, much of the industry seems to have accepted the analysis shown
above, so I don't know what else to tell you.

"The analysis shown above" [1] was not part of your presented
argument; if it's necessary to your argument, you should have
mentioned it /first/, not spent an eternity backing down to it.

Now I have somethings to read.

[1] Not shown, just mentioned.

Malcolm McLean · Aug 20, 2007

Eric Sosman said:
I hope I didn't advise exactly that! (Hint: count
the parentheses.

That's th snag, isn't it. Just add a tiny bit of visual complexity and
people make errors in the expression. Not just once but twice.

Do you really want to exhaust the computer's memory? If not, the loop must
not terminate at the while() in normal flow control.
It should be

while( allocationsleftotodo() )
{
ptr = malloc(N * sizeof *ptr);
if(!ptr)
/* oh dear. take emergency action */
/* normal code */
}

pete · Aug 20, 2007

Malcolm said:
That's th snag, isn't it. Just add a tiny bit of visual complexity and
people make errors in the expression. Not just once but twice.

Do you really want to exhaust the computer's memory? If not, the loop must
not terminate at the while() in normal flow control.
It should be

while( allocationsleftotodo() )
{
ptr = malloc(N * sizeof *ptr);
if(!ptr)
/* oh dear. take emergency action */
/* normal code */
}

The context of Eric's original advice,
was not allocation, as it is here.
The topic that I mean to address was:
assignment within a loop controling expression versus
splitting the assignment into a separate line.

CBFalconer · Aug 20, 2007

Keith said:
D'oh! Thanks for the correction:

if ((ptr = malloc(N * sizeof *ptr)) != NULL) ...

However this is a non-critical typo, since the compiler will always
catch it. The nuisance is when there are multiple such and the
compiler stops on the first error.

CBFalconer · Aug 20, 2007

Malcolm said:
.... snip ...

while( allocationsleftotodo() )
{
ptr = malloc(N * sizeof *ptr);
if(!ptr)
/* oh dear. take emergency action */
/* normal code */
}

Simpler, more readable, and more direct, is:

while (allocationsleftotodo()) {
if (!(ptr = malloc(N * sizeof *ptr))) {
/* oh dear. take emergency action */
}
else {
/* all well, take normal action */
}
}

which also allows eliding of the else to allow for 'emergency
action' making 'normal action' feasable. Note that the key event
is the success or failure of the malloc, and the structure makes
that evident. There is no searching to discover what the state of
ptr represents.

Philip Potter · Aug 20, 2007

Richard said:
Malcolm McLean said:

Does anyone else here apart from Malcolm have difficulty reading it?
Hands up please.

Like others have said, I found it confusing at first, but once I had
seen it once that was all I needed. It *is* different and idiomatic and
uncommon, but it doesn't take much to learn what it means and why it is
helpful.

Compare it to other language features such as pointers, unions, and
setjmp/longjmp, which take longer to learn and longer still to use well.
Having seen this construct once, one can use it successfully.

Phil

Philip Potter · Aug 20, 2007

Malcolm said:
I've misunderstood the algorithm and invented something better. (I
think). I'm using the square of the hash value modulus table size as an
increment. So effectively it's secondary hashing and not quadratic
probing at all.

I honestly can't believe you just said this. Hashing functions and
algorithms, like random number generators, are not to be guessed at, and
intuitively inspected for quality. If you're going to say "I think I've
invented something better" you ought to have some sort of evidence to
back this up!

Phil

websnarf · Aug 20, 2007

I've misunderstood the algorithm and invented something better. (I think).
I'm using the square of the hash value modulus table size as an increment.
So effectively it's secondary hashing and not quadratic probing at all.

As pointed out by Philip Potter, you can't be so cavalier about such
things. If the index is 0 or 1, then squaring it is going to give you
0 and 1 as the new index all the time. Furthermore, depending on the
table size, you can get all sorts of "short cycles". For example 2*2
== 4 (mod 7) and 4*4 == 16 == 2 (mod 7). So 2 and 4 just cycle back
and forth in a table of sized 7.

While that analysis from academia (specifically Knuth, but others
reproduce it) is not very generally applicable to *real world* hashing
(i.e., hashing keys/objects which are *not* integers) under their
assumptions this analysis is fairly thorough. Also the quadratic
probing technique has the advantage that you can compute the offsets
quickly using finite differences. (Your self-squaring technique,
requires the multiplier anyways, which has only very recently become
fast in all CPUs.)

hash table - simple implementation	16	Oct 17, 2012
Become a C++ programmer	5	Feb 21, 2023
How to get started with C programming:	1	Aug 7, 2023
Hash search and Interpolation search ?	3	Dec 30, 2022
How can I create a table using the input element?	1	Mar 25, 2022
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Hash table Implementation	3	Mar 29, 2011
How can I find occurrences of a column name FPPaymentID in the entire database (e.g table, stored procedure etc) in SSMS?	2	Jun 20, 2023

How to implement a Hash Table in C

Kelsey Bjarnason

Kelsey Bjarnason

Keith Thompson

Malcolm McLean

Ed Jensen

Keith Thompson

Ed Jensen

Eric Sosman

pete

Eric Sosman

pete

Keith Thompson

Chris Dollin

Malcolm McLean

pete

CBFalconer

CBFalconer

Philip Potter

Philip Potter

websnarf

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads