hash table

A

al

Here's what I am trying to write a simple hash table:

struct Employee
{
char name[30];
int id;
char department[10];
float salary;
};

struct Employee *hash_array[MAX_SIZE];

hash_array[n] = (struct Employee*)malloc(sizeof(Employee));
strcpy(*hash_array[n].name, "John Smith");
*hash_array[n].id = 1234;
*hash_array[n].department = "Marketing";
*hash_array[n].salary = 4000;
....

hash_array will contain all the pointers pointing to each object of Employee
type.

My questions are:

How to determine MAX_SIZE above?
How to calculate index, n, to locate an appropriate array item? I read
somewhere prime number can be used for such purpose(build a hash function).
Could you show me how this work?

Thanks!
 
L

Leo Custodio

If you are trying to build a simple hash table, why not use mod? (%)

Leo Custodio
(e-mail address removed)

--
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: 2.6.2

mQCNAz/nyswAAAEEAM1Jl14YqNlrUGmr4vh5OKGbDg5qiFnY/Ioqa5j5j9jlTsiH
7EJNlhIvu5OV223D0REUmWbFaKBQlnZAaDRRROb52YPuZ8NQfyu/C5zvTz8qubEx
jWn+nYryqKZxQsDwjntkNIMxx5n+QB7WhDltenCFE/VxYhsTa59EWqUqkz/RAAUR
tC5MZW9uYXJkbyBDLiBDdXN0b2RpbyA8YWxpZW5zcHJpdGVAaG90bWFpbC5jb20+
=xAh5
-----END PGP PUBLIC KEY BLOCK-----
 
P

Peter Pichler

al said:
Here's what I am trying to write a simple hash table:

struct Employee
{
char name[30];
int id;
char department[10];
float salary;
};

struct Employee *hash_array[MAX_SIZE];

hash_array[n] = (struct Employee*)malloc(sizeof(Employee));

Do not cast return from malloc(). It buys you nothing and can actually hide
strcpy(*hash_array[n].name, "John Smith");
*hash_array[n].id = 1234;
*hash_array[n].department = "Marketing";
*hash_array[n].salary = 4000;
...

hash_array will contain all the pointers pointing to each object of Employee
type.

My questions are:

How to determine MAX_SIZE above?

It depends on your hashing function. How do you calculate the hashing value?
How many possible values are there going to be? That would be your MAX_SIZE.
How to calculate index, n, to locate an appropriate array item? I read
somewhere prime number can be used for such purpose(build a hash function).
Could you show me how this work?

This is really not a C question, so it is outside the scope of this
newsgroup. You didn't even tell us what you want to base the hashing on. Is
it the name, the id, or even the salary? No, don't bother answering, take
this question to comp.programming where (I think) it is more topical.

Hope this helps,

Peter
 
C

CBFalconer

al said:
Here's what I am trying to write a simple hash table:

struct Employee
{
char name[30];
int id;
char department[10];
float salary;
};

struct Employee *hash_array[MAX_SIZE];

hash_array[n] = (struct Employee*)malloc(sizeof(Employee));
strcpy(*hash_array[n].name, "John Smith");
*hash_array[n].id = 1234;
*hash_array[n].department = "Marketing";
*hash_array[n].salary = 4000;
...

hash_array will contain all the pointers pointing to each object
of Employee type.

My questions are:

How to determine MAX_SIZE above?
How to calculate index, n, to locate an appropriate array item?
I read somewhere prime number can be used for such purpose(build
a hash function). Could you show me how this work?

This is the sort of thing I wrote hashlib for. See:

<http://cbfalconer.home.att.net/download/>

which should reduce your problems to determining suitable hash
functions and initializing and destroying struct Employee types.
hashlib will function with extremely poor hash functions, albeit
not as well as with good hash functions.
 
J

Jack Klein

On Fri, 02 Jan 2004 23:04:04 GMT, in comp.lang.c "Leo Custodio"

If you are trying to build a simple hash table, why not use mod? (%)

Leo Custodio
(e-mail address removed)

--
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: 2.6.2

mQCNAz/nyswAAAEEAM1Jl14YqNlrUGmr4vh5OKGbDg5qiFnY/Ioqa5j5j9jlTsiH
7EJNlhIvu5OV223D0REUmWbFaKBQlnZAaDRRROb52YPuZ8NQfyu/C5zvTz8qubEx
jWn+nYryqKZxQsDwjntkNIMxx5n+QB7WhDltenCFE/VxYhsTa59EWqUqkz/RAAUR
tC5MZW9uYXJkbyBDLiBDdXN0b2RpbyA8YWxpZW5zcHJpdGVAaG90bWFpbC5jb20+
=xAh5
-----END PGP PUBLIC KEY BLOCK-----
al said:
Here's what I am trying to write a simple hash table:

struct Employee
{
char name[30];
int id;
char department[10];

[snip]

I will try this again. Do not top post. New material you add belongs
after or interspersed with quoted material you are replying to. And
get rid of the PGP garbage, it has nothing to do with usenet.

[posted & mailed]

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
 
A

al

Peter Pichler said:
al said:
Here's what I am trying to write a simple hash table:

struct Employee
{
char name[30];
int id;
char department[10];
float salary;
};

struct Employee *hash_array[MAX_SIZE];

hash_array[n] = (struct Employee*)malloc(sizeof(Employee));

Do not cast return from malloc(). It buys you nothing and can actually hide
serious errors if you forget to #include <stdlib.h>. Moreover, the fewer
useless casts in your code the easier it is to read and maintain.
Thanks!

If don't use cast here, then how to convert the void* pointer to Employee *?

What serious errors could such casting cause?
 
B

Ben Pfaff

al said:
If don't use cast here, then how to convert the void* pointer to Employee *?

The compiler will do the conversion automatically upon
assignment, as in
Employee *e;
e = malloc (sizeof *e);
What serious errors could such casting cause?

It could mask a failure to #include <stdlib.h>, which yields
undefined behavior.
 
P

Peter Pichler

news:GEnJb.584711$0v4.23244818@bgtnsc04-> Peter Pichler said:
al said:
hash_array[n] = (struct Employee*)malloc(sizeof(Employee));

Do not cast return from malloc(). It buys you nothing and can actually hide
serious errors if you forget to #include <stdlib.h>. Moreover, the fewer
useless casts in your code the easier it is to read and maintain.
Thanks!

If don't use cast here, then how to convert the void* pointer to Employee
*?

In C, void* is always implicitly converted to a pointer to any data type, so
the cast is unnecessary.
What serious errors could such casting cause?

When you miss the prototype, in C89 (the older version of the standard) the
return type is implicitly considered to be int. This is plainly wrong in
case of malloc() that returns void*. Even if sizeof(int)==sizeof(void*), the
value may be returned in a different way, so you are not guaranteed to get
anything sensible. Hence UB (undefined behaviour). Casting does not /cause/
such problems, but it may /hide/ them.

Peter
 
P

Peter Pichler

Peter Pichler said:
When you miss the prototype, in C89 (the older version of the standard) the
return type is implicitly considered to be int.
<snip>

I forgot to mention that in the newer version of the standard, C99, implicit
int was made obsolete, so lack of prototypes requires diagnostics. Just FYI.

Peter
 
M

Malcolm

al said:
If don't use cast here, then how to convert the void* pointer to
Employee *?

What serious errors could such casting cause?
C allows an implicit conversion from void * to another pointer type, whilst
C++ demands a cast. If there is any chance that your code will be run
through a C++ compiler for any reason then you need to cast.
Most clc regs are of the opinion that a cast is a bad idea in definite C
source, because it could mask a failure to include stdlib.h - the cast will
suppress warnings about assuming that malloc() returns an int because there
is no prototype in scope.

On hash tables - this is more an algorithm that an C issue. However I will
explain the principle briefly.

A hash table is a way of accessing information very quickly from a key (a
unique identifier) when entries are constantly being added to the database.

The idea is that you take the unique key (eg a car registration, an employee
id) and apply a hash function to it to convert the key into an index into
the look-up table.

The problem comes when two keys hash to the same value. To resolve this
situation, you employ a parking-lot algorithm - you try the next slot up and
then the next, until you find a free slot. This also adds complexity to the
seek - you need to test the hash value, check the key, and if it doesn't
match work upwards until you find your data.

If you come to an empty slot, you know that there is no matching entry.
There are also problems in removing data, which I won't go into.

As a rule of thumb you need about twice as many slots in your hash tablae as
you have data items, to prevent degenerate behaviour when you get too many
collisions.
 
P

Peter Nilsson

al said:
Here's what I am trying to write a simple hash table:

struct Employee
{
char name[30];
int id;
char department[10];
float salary;
};

struct Employee *hash_array[MAX_SIZE];
hash_array[n] = (struct Employee*)malloc(sizeof(Employee));
strcpy(*hash_array[n].name, "John Smith");
*hash_array[n].id = 1234;
*hash_array[n].department = "Marketing";
*hash_array[n].salary = 4000;
...

hash_array will contain all the pointers pointing to each object of
Employee type.

My questions are:

How to determine MAX_SIZE above?
How to calculate index, n, to locate an appropriate array item? I read
somewhere prime number can be used for such purpose(build a hash
function).
Could you show me how this work?

Google for hashing techniques. It's a non-trivial subject and largely
outside the domain of clc since it is not ISO C specific.

The one question (which you didn't ask) which clc certainly can help you
with is "How do get this to compile!"

The . operator has a higher precedence than unary *, so an expression
like...

*hash_array[n].id

....is parsed as...

*(hash_array[n].id)

But hash_array[n] is not a struct, it's a pointer to struct so the '.id'
part should cause your compiler to question your code. What you want is...

hash_array[n]->id

Also, you're code...

strcpy(*hash_array[n].name, "John Smith");

....even after adding the above correction, attempts to copy data through an
uninitialised pointer 'name'.

Note that C is not C++ (don't use a C++ compiler to compile C; at least, not
when you don't know what you're doing). Declaring a struct tag will not make
that tag a stand alone type name. You're sizeof(Employee) is invalid without
an
appropriate typedef (say)...

typedef struct Employee Employee;

That said, you would do better in general by writing your malloc in the
form...

hash_array[n] = malloc(sizeof *hash_array[n]);

So, search elsewhere for hashing methods then, if you have trouble coding a
given technique in C, come back and ask questions about your C code.
 
C

Chris Torek

I forgot to mention that in the newer version of the standard, C99, implicit
int was made obsolete, so lack of prototypes requires diagnostics. Just FYI.

Slight correction: it is the lack of a declaration that requires
a diagnostic -- but you can still declare functions without giving
prototypes for them.

The following illustrates this by example:

int f1(); /* declares f1(), but does not provide a prototype */
int f2(void); /* declares f2() and provides a prototype */

void f3(x, y, z) double x, y, z; { /* defines f3(), no prototype */
...
}
void f4(double x, double y, double z) { /* defines f3, gives prototype */
...
}

If you study the Standard, you will find that every function
definition is a declaration, and every prototype is a declaration,
but not every declaration is a prototype. The "old-style", "K&R-1"
function declarations and definitions are the ones that fail to
provide prototypes.

Curiously, while a K&R-1 style function definition defines a
function whose prototype is trivial for any C compiler to calculate
at that point, the Standard does not require compilers to do this.
 
B

Ben Pfaff

Peter Pichler said:
I forgot to mention that in the newer version of the standard, C99, implicit
int was made obsolete, so lack of prototypes requires diagnostics. Just FYI.

The C99 foreword contains a laundry list of differences from C89.
One of the items in the list simply says "remove implicit int".
I've now seen several misinterpretations of what this means.

Here's what it really means: in C99, every declaration must
include a type-specifier, such as `char', `short', `int', `long',
`signed', or `unsigned'. That means that each declaration on the
left below is no longer allowed, and must be replaced by the
corresponding declaration on the right:

valid in C89 only valid in C89, C99
----------------- -----------------
const w; const int w;
static x; static int x;
typedef y; typedef int y;
extern z[]; extern int z[];
foo () {} int foo () {}
static bar () {} static int bar () {}

This change is an intentional consequence of a pair of fairly
minor changes from C89 to C99:

1. C89 contained the following language in section 6.5.2
"Type specifiers":

Each list of type specifiers shall be one of the
following sets (delimited by commas, when there is
more than one set on a line); the type specifiers may
occur in any order, possibly intermixed with the other
declaration specifiers.

...
- int, signed, signed int, or no type specifiers
...

C99 added the following sentence at the top of the
paragraph, which is now in section 6.7.2:

At least one type specifier shall be given in the
declaration specifiers in each declaration...

C99 also removed the choice `no type specifiers' from the
list of aliases for int.

2. C89 sections 6.7.1 "Function definitions" makes
declaration-specifiers optional in function-definition.
In C99, now in section 6.9.1, declaration-specifiers are
mandatory.

Now for what removal of "implicit int" does not mean:

* All of the following are valid declarations in both C89 and
C99, because all of them include a type-specifier. This is
admittedly confusing because the keyword `int' is indeed
implied in these declarations. It would be more accurate
to say that C99 removes "implicit type-specifier", not
"implicit int", but for whatever reason, the committee
didn't choose that wording.

short w;
long x;
signed y;
unsigned z;

* The following have always been invalid declarations:

x;
foo ();

The problem is that neither one of these includes any
declaration-specifier, whereas at least one is mandatory.
Three classes of syntax can be declaration-specifiers: a
type-specifier (already described), a type-qualifier
(`const', `volatile', and in C99 `restrict'), or a
storage-class-specifier (`typedef', `extern', `static',
`auto', `register'). Adding any of these to either of
these declarations will make it valid.

(If these constructions could be accepted as declarations,
then declarations would be ambiguous with statements at
block scope.)

Notwithstanding this, the following was valid in C89,
though it is no longer in C99:

foo () {}

The reason is that function definitions are subject to
syntax rules separate and somewhat different from other
declarations; see #2 above.

* In C89, undeclared functions could be called, with the
compiler assuming that the function returned `int'. In
C99, this is no longer true, but it is a separate change,
described in the foreword as "remove implicit function
declaration". So this is not, strictly speaking, removal
of "implicit int" either.
 
P

Peter Pichler

Chris Torek said:
FYI.

Slight correction: it is the lack of a declaration that requires
a diagnostic -- but you can still declare functions without giving
prototypes for them.

Doh! Thanks for pointing that out. The same to Ben, who gave a very
exhaustive explanation with a long list of examples. I blame the late night
hour ;-)
 
B

Barry Schwarz

Here's what I am trying to write a simple hash table:

struct Employee
{
char name[30];
int id;
char department[10];
float salary;
};

struct Employee *hash_array[MAX_SIZE];

hash_array[n] = (struct Employee*)malloc(sizeof(Employee));
strcpy(*hash_array[n].name, "John Smith");
*hash_array[n].id = 1234;

The . operator has higher precedence than the * operator so this means
the same as *(hash_array[n].id) when what you want is (*hash[n]).id
which is accomplished with the -> operator as in
hash_array[n]->id = 1234;
*hash_array[n].department = "Marketing";

department is an array. You cannot assign a string literal to an
array with =. You need to copy the data to the array as you did for
name.
*hash_array[n].salary = 4000;
...

hash_array will contain all the pointers pointing to each object of Employee
type.

My questions are:

How to determine MAX_SIZE above?
How to calculate index, n, to locate an appropriate array item? I read
somewhere prime number can be used for such purpose(build a hash function).
Could you show me how this work?

Thanks!



<<Remove the del for email>>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top