How to implement a Hash Table in C

K

Kelsey Bjarnason

Malcolm McLean said:


Does anyone else here apart from Malcolm have difficulty reading it?

This is one of those things which, like migrating to x++ from x = x + 1,
simply takes getting used to. It is idiomatic in what one might term
"good C code", not to mention around here, but a rank newbie - i.e.
someone who should not be reading MM's book in the first place - might not
grok it immediately.

Once grokked, however, it is a perfectly sensible way to do the job, for
the usual reasons.
 
K

Keith Thompson

Malcolm McLean said:
The objection to

ptr = malloc(sizeof *ptr);

is purely syntactical. As Keith said, if sizeof were spelt $ and given
normal operator rules for whitespace

ptr = malloc($*ptr);

would be OK. The * wouldn't look like multiplication of an identifier.

As I said, if you really think it's such a problem, you *can* put
partentheses around the operand:

ptr = malloc(sizeof(*ptr));

Do you have any objection, "syntactical" or otherwise, to that form?
 
M

Malcolm McLean

Keith Thompson said:
As I said, if you really think it's such a problem, you *can* put
partentheses around the operand:

ptr = malloc(sizeof(*ptr));

Do you have any objection, "syntactical" or otherwise, to that form?
I prefer it. It is not uncommon to malloc more than one data item.

so ptr = malloc( N * sizeof *ptr);

if less clear than
ptr = malloc(N * sizeof(*ptr));

unless you are one of those people who likes to put the test in the line

if( ptr = malloc(N * sizeof(*ptr)) );

then some compilers like

if( (ptr = malloc(N * sizeof(*ptr))) )

at which point we've broken the rule of three and lost readbility.
 
E

Ed Jensen

Malcolm McLean said:
I prefer it. It is not uncommon to malloc more than one data item.

so ptr = malloc( N * sizeof *ptr);

if less clear than
ptr = malloc(N * sizeof(*ptr));

unless you are one of those people who likes to put the test in the line

if( ptr = malloc(N * sizeof(*ptr)) );

then some compilers like

if( (ptr = malloc(N * sizeof(*ptr))) )

at which point we've broken the rule of three and lost readbility.

There's always:

if (ptr = malloc(N * sizeof *ptr), ptr == NULL) ...
 
K

Keith Thompson

Ed Jensen said:
There's always:

if (ptr = malloc(N * sizeof *ptr), ptr == NULL) ...

Or just:

if ((ptr = malloc(N * sizeof *ptr) != NULL) ...

I'd probably just put the test on a separate line.
 
E

Ed Jensen

Keith Thompson said:
Or just:

if ((ptr = malloc(N * sizeof *ptr) != NULL) ...

I'd probably just put the test on a separate line.

Good idea, since your one liner would fail to compile.
 
E

Eric Sosman

Richard said:
[...[
This is one of the reasons
twin primes are popular as table sizes: you use the hash code
mod N for the first probe, and then if necessary you use the
hash mod (N-2) plus 1 as the increment for further probes. The
primality of N guarantees that the increment is relatively prime
to it; the primality of N-2 helps scramble groups of keys that
have simple relationships (sum1, sum2, sum3, ...).

I'm not clear as to why the other sibling in a pair twin prime
should be superior for scrambling as compared to using some other
prime.

Any prime will scramble about as well as any other (I'm
speaking very loosely here, of course), but a larger prime
will yield more different remainders than a smaller one --
for example, consider K % 65537 vs. K % 2.
 
P

pete

Keith said:
Or just:

if ((ptr = malloc(N * sizeof *ptr) != NULL) ...

I'd probably just put the test on a separate line.

I recall Eric Sosman advising me that in the case of loops,
it's probably better to write something like

while((ptr = malloc(N * sizeof *ptr) != NULL) {
/**/
}

than it is to write

ptr = malloc(N * sizeof *ptr);
while(ptr != NULL) {
/**/
ptr = malloc(N * sizeof *ptr);
}

because it makes it more obvious that there is only one expression
generating values for ptr.
 
E

Eric Sosman

pete said:
I recall Eric Sosman advising me that in the case of loops,
it's probably better to write something like

while((ptr = malloc(N * sizeof *ptr) != NULL) {
/**/
}

I hope I didn't advise exactly that! (Hint: count
the parentheses.
than it is to write

ptr = malloc(N * sizeof *ptr);
while(ptr != NULL) {
/**/
ptr = malloc(N * sizeof *ptr);
}

because it makes it more obvious that there is only one expression
generating values for ptr.

General principle: If something is written in only one
place, that one place is authoritative -- it may be right or
it may be wrong, but it can be judged and possibly corrected
on its own. If something is written in two places, neither
is authoritative and there can be a question about which is
right, and if both are wrong but only one gets corrected, ...

Or, as an older adage has it: "A man with two watches
never knows the correct time."
 
P

pete

Eric said:
I hope I didn't advise exactly that! (Hint: count
the parentheses.


General principle: If something is written in only one
place, that one place is authoritative -- it may be right or
it may be wrong, but it can be judged and possibly corrected
on its own. If something is written in two places, neither
is authoritative and there can be a question about which is
right, and if both are wrong but only one gets corrected, ...

Or, as an older adage has it: "A man with two watches
never knows the correct time."

Thank you.
 
C

Chris Dollin

Well, much of the industry seems to have accepted the analysis shown
above, so I don't know what else to tell you.

"The analysis shown above" [1] was not part of your presented
argument; if it's necessary to your argument, you should have
mentioned it /first/, not spent an eternity backing down to it.

Now I have somethings to read.

[1] Not shown, just mentioned.
 
M

Malcolm McLean

Eric Sosman said:
I hope I didn't advise exactly that! (Hint: count
the parentheses.
That's th snag, isn't it. Just add a tiny bit of visual complexity and
people make errors in the expression. Not just once but twice.

Do you really want to exhaust the computer's memory? If not, the loop must
not terminate at the while() in normal flow control.
It should be

while( allocationsleftotodo() )
{
ptr = malloc(N * sizeof *ptr);
if(!ptr)
/* oh dear. take emergency action */
/* normal code */
}
 
P

pete

Malcolm said:
That's th snag, isn't it. Just add a tiny bit of visual complexity and
people make errors in the expression. Not just once but twice.

Do you really want to exhaust the computer's memory? If not, the loop must
not terminate at the while() in normal flow control.
It should be

while( allocationsleftotodo() )
{
ptr = malloc(N * sizeof *ptr);
if(!ptr)
/* oh dear. take emergency action */
/* normal code */
}

The context of Eric's original advice,
was not allocation, as it is here.
The topic that I mean to address was:
assignment within a loop controling expression versus
splitting the assignment into a separate line.
 
C

CBFalconer

Keith said:
D'oh! Thanks for the correction:

if ((ptr = malloc(N * sizeof *ptr)) != NULL) ...

However this is a non-critical typo, since the compiler will always
catch it. The nuisance is when there are multiple such and the
compiler stops on the first error.
 
C

CBFalconer

Malcolm said:
.... snip ...

while( allocationsleftotodo() )
{
ptr = malloc(N * sizeof *ptr);
if(!ptr)
/* oh dear. take emergency action */
/* normal code */
}

Simpler, more readable, and more direct, is:

while (allocationsleftotodo()) {
if (!(ptr = malloc(N * sizeof *ptr))) {
/* oh dear. take emergency action */
}
else {
/* all well, take normal action */
}
}

which also allows eliding of the else to allow for 'emergency
action' making 'normal action' feasable. Note that the key event
is the success or failure of the malloc, and the structure makes
that evident. There is no searching to discover what the state of
ptr represents.
 
P

Philip Potter

Richard said:
Malcolm McLean said:

Does anyone else here apart from Malcolm have difficulty reading it?
Hands up please.

Like others have said, I found it confusing at first, but once I had
seen it once that was all I needed. It *is* different and idiomatic and
uncommon, but it doesn't take much to learn what it means and why it is
helpful.

Compare it to other language features such as pointers, unions, and
setjmp/longjmp, which take longer to learn and longer still to use well.
Having seen this construct once, one can use it successfully.

Phil
 
P

Philip Potter

Malcolm said:
I've misunderstood the algorithm and invented something better. (I
think). I'm using the square of the hash value modulus table size as an
increment. So effectively it's secondary hashing and not quadratic
probing at all.

I honestly can't believe you just said this. Hashing functions and
algorithms, like random number generators, are not to be guessed at, and
intuitively inspected for quality. If you're going to say "I think I've
invented something better" you ought to have some sort of evidence to
back this up!

Phil
 
W

websnarf

I've misunderstood the algorithm and invented something better. (I think).
I'm using the square of the hash value modulus table size as an increment.
So effectively it's secondary hashing and not quadratic probing at all.

As pointed out by Philip Potter, you can't be so cavalier about such
things. If the index is 0 or 1, then squaring it is going to give you
0 and 1 as the new index all the time. Furthermore, depending on the
table size, you can get all sorts of "short cycles". For example 2*2
== 4 (mod 7) and 4*4 == 16 == 2 (mod 7). So 2 and 4 just cycle back
and forth in a table of sized 7.

While that analysis from academia (specifically Knuth, but others
reproduce it) is not very generally applicable to *real world* hashing
(i.e., hashing keys/objects which are *not* integers) under their
assumptions this analysis is fairly thorough. Also the quadratic
probing technique has the advantage that you can compute the offsets
quickly using finite differences. (Your self-squaring technique,
requires the multiplier anyways, which has only very recently become
fast in all CPUs.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,772
Messages
2,569,593
Members
45,113
Latest member
Vinay KumarNevatia
Top