hash table

Discussion in 'C Programming' started by al, Jan 2, 2004.

  1. al

    al Guest

    Here's what I am trying to write a simple hash table:

    struct Employee
    {
    char name[30];
    int id;
    char department[10];
    float salary;
    };

    struct Employee *hash_array[MAX_SIZE];

    hash_array[n] = (struct Employee*)malloc(sizeof(Employee));
    strcpy(*hash_array[n].name, "John Smith");
    *hash_array[n].id = 1234;
    *hash_array[n].department = "Marketing";
    *hash_array[n].salary = 4000;
    ....

    hash_array will contain all the pointers pointing to each object of Employee
    type.

    My questions are:

    How to determine MAX_SIZE above?
    How to calculate index, n, to locate an appropriate array item? I read
    somewhere prime number can be used for such purpose(build a hash function).
    Could you show me how this work?

    Thanks!
     
    al, Jan 2, 2004
    #1
    1. Advertising

  2. al

    Leo Custodio Guest

    If you are trying to build a simple hash table, why not use mod? (%)

    Leo Custodio


    --
    -----BEGIN PGP PUBLIC KEY BLOCK-----
    Version: 2.6.2

    mQCNAz/nyswAAAEEAM1Jl14YqNlrUGmr4vh5OKGbDg5qiFnY/Ioqa5j5j9jlTsiH
    7EJNlhIvu5OV223D0REUmWbFaKBQlnZAaDRRROb52YPuZ8NQfyu/C5zvTz8qubEx
    jWn+nYryqKZxQsDwjntkNIMxx5n+QB7WhDltenCFE/VxYhsTa59EWqUqkz/RAAUR
    tC5MZW9uYXJkbyBDLiBDdXN0b2RpbyA8YWxpZW5zcHJpdGVAaG90bWFpbC5jb20+
    =xAh5
    -----END PGP PUBLIC KEY BLOCK-----
    "al" <> escreveu na mensagem
    news:R1mJb.584231$...
    > Here's what I am trying to write a simple hash table:
    >
    > struct Employee
    > {
    > char name[30];
    > int id;
    > char department[10];
    > float salary;
    > };
    >
    > struct Employee *hash_array[MAX_SIZE];
    >
    > hash_array[n] = (struct Employee*)malloc(sizeof(Employee));
    > strcpy(*hash_array[n].name, "John Smith");
    > *hash_array[n].id = 1234;
    > *hash_array[n].department = "Marketing";
    > *hash_array[n].salary = 4000;
    > ...
    >
    > hash_array will contain all the pointers pointing to each object of

    Employee
    > type.
    >
    > My questions are:
    >
    > How to determine MAX_SIZE above?
    > How to calculate index, n, to locate an appropriate array item? I read
    > somewhere prime number can be used for such purpose(build a hash

    function).
    > Could you show me how this work?
    >
    > Thanks!
    >
    >
    >
     
    Leo Custodio, Jan 2, 2004
    #2
    1. Advertising

  3. "al" <> wrote in message
    news:R1mJb.584231$...
    > Here's what I am trying to write a simple hash table:
    >
    > struct Employee
    > {
    > char name[30];
    > int id;
    > char department[10];
    > float salary;
    > };
    >
    > struct Employee *hash_array[MAX_SIZE];
    >
    > hash_array[n] = (struct Employee*)malloc(sizeof(Employee));


    Do not cast return from malloc(). It buys you nothing and can actually hide
    serious errors if you forget to #include <stdlib.h>. Moreover, the fewer
    useless casts in your code the easier it is to read and maintain.

    > strcpy(*hash_array[n].name, "John Smith");
    > *hash_array[n].id = 1234;
    > *hash_array[n].department = "Marketing";
    > *hash_array[n].salary = 4000;
    > ...
    >
    > hash_array will contain all the pointers pointing to each object of

    Employee
    > type.
    >
    > My questions are:
    >
    > How to determine MAX_SIZE above?


    It depends on your hashing function. How do you calculate the hashing value?
    How many possible values are there going to be? That would be your MAX_SIZE.

    > How to calculate index, n, to locate an appropriate array item? I read
    > somewhere prime number can be used for such purpose(build a hash

    function).
    > Could you show me how this work?


    This is really not a C question, so it is outside the scope of this
    newsgroup. You didn't even tell us what you want to base the hashing on. Is
    it the name, the id, or even the salary? No, don't bother answering, take
    this question to comp.programming where (I think) it is more topical.

    Hope this helps,

    Peter
     
    Peter Pichler, Jan 2, 2004
    #3
  4. al

    CBFalconer Guest

    al wrote:
    >
    > Here's what I am trying to write a simple hash table:
    >
    > struct Employee
    > {
    > char name[30];
    > int id;
    > char department[10];
    > float salary;
    > };
    >
    > struct Employee *hash_array[MAX_SIZE];
    >
    > hash_array[n] = (struct Employee*)malloc(sizeof(Employee));
    > strcpy(*hash_array[n].name, "John Smith");
    > *hash_array[n].id = 1234;
    > *hash_array[n].department = "Marketing";
    > *hash_array[n].salary = 4000;
    > ...
    >
    > hash_array will contain all the pointers pointing to each object
    > of Employee type.
    >
    > My questions are:
    >
    > How to determine MAX_SIZE above?
    > How to calculate index, n, to locate an appropriate array item?
    > I read somewhere prime number can be used for such purpose(build
    > a hash function). Could you show me how this work?


    This is the sort of thing I wrote hashlib for. See:

    <http://cbfalconer.home.att.net/download/>

    which should reduce your problems to determining suitable hash
    functions and initializing and destroying struct Employee types.
    hashlib will function with extremely poor hash functions, albeit
    not as well as with good hash functions.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
     
    CBFalconer, Jan 2, 2004
    #4
  5. al

    Jack Klein Guest

    On Fri, 02 Jan 2004 23:04:04 GMT, in comp.lang.c "Leo Custodio"
    <> wrote:

    If you are trying to build a simple hash table, why not use mod? (%)

    Leo Custodio


    --
    -----BEGIN PGP PUBLIC KEY BLOCK-----
    Version: 2.6.2

    mQCNAz/nyswAAAEEAM1Jl14YqNlrUGmr4vh5OKGbDg5qiFnY/Ioqa5j5j9jlTsiH
    7EJNlhIvu5OV223D0REUmWbFaKBQlnZAaDRRROb52YPuZ8NQfyu/C5zvTz8qubEx
    jWn+nYryqKZxQsDwjntkNIMxx5n+QB7WhDltenCFE/VxYhsTa59EWqUqkz/RAAUR
    tC5MZW9uYXJkbyBDLiBDdXN0b2RpbyA8YWxpZW5zcHJpdGVAaG90bWFpbC5jb20+
    =xAh5
    -----END PGP PUBLIC KEY BLOCK-----
    "al" <> escreveu na mensagem
    news:R1mJb.584231$...
    > Here's what I am trying to write a simple hash table:
    >
    > struct Employee
    > {
    > char name[30];
    > int id;
    > char department[10];


    [snip]

    I will try this again. Do not top post. New material you add belongs
    after or interspersed with quoted material you are replying to. And
    get rid of the PGP garbage, it has nothing to do with usenet.

    [posted & mailed]

    --
    Jack Klein
    Home: http://JK-Technology.Com
    FAQs for
    comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
    comp.lang.c++ http://www.parashift.com/c -faq-lite/
    alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c /faq
     
    Jack Klein, Jan 2, 2004
    #5
  6. al

    al Guest

    Peter Pichler <> wrote in message
    news:3ff5fc8b$...
    > "al" <> wrote in message
    > news:R1mJb.584231$...
    > > Here's what I am trying to write a simple hash table:
    > >
    > > struct Employee
    > > {
    > > char name[30];
    > > int id;
    > > char department[10];
    > > float salary;
    > > };
    > >
    > > struct Employee *hash_array[MAX_SIZE];
    > >
    > > hash_array[n] = (struct Employee*)malloc(sizeof(Employee));

    >
    > Do not cast return from malloc(). It buys you nothing and can actually

    hide
    > serious errors if you forget to #include <stdlib.h>. Moreover, the fewer
    > useless casts in your code the easier it is to read and maintain.
    >

    Thanks!

    If don't use cast here, then how to convert the void* pointer to Employee *?

    What serious errors could such casting cause?
     
    al, Jan 3, 2004
    #6
  7. al

    Ben Pfaff Guest

    "al" <> writes:

    > If don't use cast here, then how to convert the void* pointer to Employee *?


    The compiler will do the conversion automatically upon
    assignment, as in
    Employee *e;
    e = malloc (sizeof *e);

    > What serious errors could such casting cause?


    It could mask a failure to #include <stdlib.h>, which yields
    undefined behavior.
    --
    "Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are,
    by definition, not smart enough to debug it."
    --Brian Kernighan
     
    Ben Pfaff, Jan 3, 2004
    #7
  8. "al" <> wrote in message
    news:GEnJb.584711$0v4.23244818@bgtnsc04-> Peter Pichler <>
    wrote in message
    > > "al" <> wrote in message
    > > >
    > > > hash_array[n] = (struct Employee*)malloc(sizeof(Employee));

    > >
    > > Do not cast return from malloc(). It buys you nothing and can actually

    > hide
    > > serious errors if you forget to #include <stdlib.h>. Moreover, the fewer
    > > useless casts in your code the easier it is to read and maintain.
    > >

    > Thanks!
    >
    > If don't use cast here, then how to convert the void* pointer to Employee

    *?

    In C, void* is always implicitly converted to a pointer to any data type, so
    the cast is unnecessary.

    > What serious errors could such casting cause?


    When you miss the prototype, in C89 (the older version of the standard) the
    return type is implicitly considered to be int. This is plainly wrong in
    case of malloc() that returns void*. Even if sizeof(int)==sizeof(void*), the
    value may be returned in a different way, so you are not guaranteed to get
    anything sensible. Hence UB (undefined behaviour). Casting does not /cause/
    such problems, but it may /hide/ them.

    Peter
     
    Peter Pichler, Jan 3, 2004
    #8
  9. "Peter Pichler" <> wrote:
    > When you miss the prototype, in C89 (the older version of the standard)

    the
    > return type is implicitly considered to be int.

    <snip>

    I forgot to mention that in the newer version of the standard, C99, implicit
    int was made obsolete, so lack of prototypes requires diagnostics. Just FYI.

    Peter
     
    Peter Pichler, Jan 3, 2004
    #9
  10. al

    Malcolm Guest

    "al" <> wrote in message news:GEnJb.584711
    > If don't use cast here, then how to convert the void* pointer to
    > Employee *?
    >
    > What serious errors could such casting cause?
    >

    C allows an implicit conversion from void * to another pointer type, whilst
    C++ demands a cast. If there is any chance that your code will be run
    through a C++ compiler for any reason then you need to cast.
    Most clc regs are of the opinion that a cast is a bad idea in definite C
    source, because it could mask a failure to include stdlib.h - the cast will
    suppress warnings about assuming that malloc() returns an int because there
    is no prototype in scope.

    On hash tables - this is more an algorithm that an C issue. However I will
    explain the principle briefly.

    A hash table is a way of accessing information very quickly from a key (a
    unique identifier) when entries are constantly being added to the database.

    The idea is that you take the unique key (eg a car registration, an employee
    id) and apply a hash function to it to convert the key into an index into
    the look-up table.

    The problem comes when two keys hash to the same value. To resolve this
    situation, you employ a parking-lot algorithm - you try the next slot up and
    then the next, until you find a free slot. This also adds complexity to the
    seek - you need to test the hash value, check the key, and if it doesn't
    match work upwards until you find your data.

    If you come to an empty slot, you know that there is no matching entry.
    There are also problems in removing data, which I won't go into.

    As a rule of thumb you need about twice as many slots in your hash tablae as
    you have data items, to prevent degenerate behaviour when you get too many
    collisions.
     
    Malcolm, Jan 3, 2004
    #10
  11. "al" <> wrote in message
    news:R1mJb.584231$...
    > Here's what I am trying to write a simple hash table:
    >
    > struct Employee
    > {
    > char name[30];
    > int id;
    > char department[10];
    > float salary;
    > };
    >
    > struct Employee *hash_array[MAX_SIZE];
    > hash_array[n] = (struct Employee*)malloc(sizeof(Employee));
    > strcpy(*hash_array[n].name, "John Smith");
    > *hash_array[n].id = 1234;
    > *hash_array[n].department = "Marketing";
    > *hash_array[n].salary = 4000;
    > ...
    >
    > hash_array will contain all the pointers pointing to each object of
    > Employee type.
    >
    > My questions are:
    >
    > How to determine MAX_SIZE above?
    > How to calculate index, n, to locate an appropriate array item? I read
    > somewhere prime number can be used for such purpose(build a hash
    > function).
    > Could you show me how this work?


    Google for hashing techniques. It's a non-trivial subject and largely
    outside the domain of clc since it is not ISO C specific.

    The one question (which you didn't ask) which clc certainly can help you
    with is "How do get this to compile!"

    The . operator has a higher precedence than unary *, so an expression
    like...

    *hash_array[n].id

    ....is parsed as...

    *(hash_array[n].id)

    But hash_array[n] is not a struct, it's a pointer to struct so the '.id'
    part should cause your compiler to question your code. What you want is...

    hash_array[n]->id

    Also, you're code...

    strcpy(*hash_array[n].name, "John Smith");

    ....even after adding the above correction, attempts to copy data through an
    uninitialised pointer 'name'.

    Note that C is not C++ (don't use a C++ compiler to compile C; at least, not
    when you don't know what you're doing). Declaring a struct tag will not make
    that tag a stand alone type name. You're sizeof(Employee) is invalid without
    an
    appropriate typedef (say)...

    typedef struct Employee Employee;

    That said, you would do better in general by writing your malloc in the
    form...

    hash_array[n] = malloc(sizeof *hash_array[n]);

    So, search elsewhere for hashing methods then, if you have trouble coding a
    given technique in C, come back and ask questions about your C code.

    --
    Peter
     
    Peter Nilsson, Jan 3, 2004
    #11
  12. al

    Chris Torek Guest

    In article <>
    Peter Pichler <> writes:
    >I forgot to mention that in the newer version of the standard, C99, implicit
    >int was made obsolete, so lack of prototypes requires diagnostics. Just FYI.


    Slight correction: it is the lack of a declaration that requires
    a diagnostic -- but you can still declare functions without giving
    prototypes for them.

    The following illustrates this by example:

    int f1(); /* declares f1(), but does not provide a prototype */
    int f2(void); /* declares f2() and provides a prototype */

    void f3(x, y, z) double x, y, z; { /* defines f3(), no prototype */
    ...
    }
    void f4(double x, double y, double z) { /* defines f3, gives prototype */
    ...
    }

    If you study the Standard, you will find that every function
    definition is a declaration, and every prototype is a declaration,
    but not every declaration is a prototype. The "old-style", "K&R-1"
    function declarations and definitions are the ones that fail to
    provide prototypes.

    Curiously, while a K&R-1 style function definition defines a
    function whose prototype is trivial for any C compiler to calculate
    at that point, the Standard does not require compilers to do this.
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
     
    Chris Torek, Jan 3, 2004
    #12
  13. al

    Ben Pfaff Guest

    "Peter Pichler" <> writes:

    > I forgot to mention that in the newer version of the standard, C99, implicit
    > int was made obsolete, so lack of prototypes requires diagnostics. Just FYI.


    The C99 foreword contains a laundry list of differences from C89.
    One of the items in the list simply says "remove implicit int".
    I've now seen several misinterpretations of what this means.

    Here's what it really means: in C99, every declaration must
    include a type-specifier, such as `char', `short', `int', `long',
    `signed', or `unsigned'. That means that each declaration on the
    left below is no longer allowed, and must be replaced by the
    corresponding declaration on the right:

    valid in C89 only valid in C89, C99
    ----------------- -----------------
    const w; const int w;
    static x; static int x;
    typedef y; typedef int y;
    extern z[]; extern int z[];
    foo () {} int foo () {}
    static bar () {} static int bar () {}

    This change is an intentional consequence of a pair of fairly
    minor changes from C89 to C99:

    1. C89 contained the following language in section 6.5.2
    "Type specifiers":

    Each list of type specifiers shall be one of the
    following sets (delimited by commas, when there is
    more than one set on a line); the type specifiers may
    occur in any order, possibly intermixed with the other
    declaration specifiers.

    ...
    - int, signed, signed int, or no type specifiers
    ...

    C99 added the following sentence at the top of the
    paragraph, which is now in section 6.7.2:

    At least one type specifier shall be given in the
    declaration specifiers in each declaration...

    C99 also removed the choice `no type specifiers' from the
    list of aliases for int.

    2. C89 sections 6.7.1 "Function definitions" makes
    declaration-specifiers optional in function-definition.
    In C99, now in section 6.9.1, declaration-specifiers are
    mandatory.

    Now for what removal of "implicit int" does not mean:

    * All of the following are valid declarations in both C89 and
    C99, because all of them include a type-specifier. This is
    admittedly confusing because the keyword `int' is indeed
    implied in these declarations. It would be more accurate
    to say that C99 removes "implicit type-specifier", not
    "implicit int", but for whatever reason, the committee
    didn't choose that wording.

    short w;
    long x;
    signed y;
    unsigned z;

    * The following have always been invalid declarations:

    x;
    foo ();

    The problem is that neither one of these includes any
    declaration-specifier, whereas at least one is mandatory.
    Three classes of syntax can be declaration-specifiers: a
    type-specifier (already described), a type-qualifier
    (`const', `volatile', and in C99 `restrict'), or a
    storage-class-specifier (`typedef', `extern', `static',
    `auto', `register'). Adding any of these to either of
    these declarations will make it valid.

    (If these constructions could be accepted as declarations,
    then declarations would be ambiguous with statements at
    block scope.)

    Notwithstanding this, the following was valid in C89,
    though it is no longer in C99:

    foo () {}

    The reason is that function definitions are subject to
    syntax rules separate and somewhat different from other
    declarations; see #2 above.

    * In C89, undeclared functions could be called, with the
    compiler assuming that the function returned `int'. In
    C99, this is no longer true, but it is a separate change,
    described in the foreword as "remove implicit function
    declaration". So this is not, strictly speaking, removal
    of "implicit int" either.
    --
    Go not to Usenet for counsel, for they will say both no and yes.
     
    Ben Pfaff, Jan 3, 2004
    #13
  14. "Chris Torek" <> wrote:
    > Peter Pichler <> writes:
    > >I forgot to mention that in the newer version of the standard, C99,

    implicit
    > >int was made obsolete, so lack of prototypes requires diagnostics. Just

    FYI.
    >
    > Slight correction: it is the lack of a declaration that requires
    > a diagnostic -- but you can still declare functions without giving
    > prototypes for them.


    Doh! Thanks for pointing that out. The same to Ben, who gave a very
    exhaustive explanation with a long list of examples. I blame the late night
    hour ;-)
     
    Peter Pichler, Jan 3, 2004
    #14
  15. On Fri, 02 Jan 2004 22:13:05 GMT, "al" <> wrote:

    >Here's what I am trying to write a simple hash table:
    >
    >struct Employee
    >{
    > char name[30];
    > int id;
    > char department[10];
    > float salary;
    >};
    >
    >struct Employee *hash_array[MAX_SIZE];
    >
    >hash_array[n] = (struct Employee*)malloc(sizeof(Employee));
    >strcpy(*hash_array[n].name, "John Smith");
    >*hash_array[n].id = 1234;


    The . operator has higher precedence than the * operator so this means
    the same as *(hash_array[n].id) when what you want is (*hash[n]).id
    which is accomplished with the -> operator as in
    hash_array[n]->id = 1234;

    >*hash_array[n].department = "Marketing";


    department is an array. You cannot assign a string literal to an
    array with =. You need to copy the data to the array as you did for
    name.

    >*hash_array[n].salary = 4000;
    >...
    >
    >hash_array will contain all the pointers pointing to each object of Employee
    >type.
    >
    >My questions are:
    >
    >How to determine MAX_SIZE above?
    >How to calculate index, n, to locate an appropriate array item? I read
    >somewhere prime number can be used for such purpose(build a hash function).
    >Could you show me how this work?
    >
    >Thanks!
    >
    >




    <<Remove the del for email>>
     
    Barry Schwarz, Jan 3, 2004
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Williams
    Replies:
    2
    Views:
    1,171
    Jacob Yang [MSFT]
    Aug 12, 2003
  2. Rio
    Replies:
    4
    Views:
    1,241
  3. Pieter Claassen
    Replies:
    1
    Views:
    1,150
    CBFalconer
    Aug 4, 2004
  4. rp
    Replies:
    1
    Views:
    583
    red floyd
    Nov 10, 2011
  5. Srijayanth Sridhar
    Replies:
    19
    Views:
    670
    David A. Black
    Jul 2, 2008
Loading...

Share This Page