Hashing function (possibly OT)

Discussion in 'C Programming' started by Matt Bull, Jul 7, 2003.

  1. Matt Bull

    Matt Bull Guest

    Hi,

    I apologise in advance if this is off topic, but would appreciate any
    pointers you chaps might be able to provide.

    I'm relatively novice in the art of C so am after any suggestions about
    implementing a very simple hash (or quasi hash)

    I need to be able to store a counter and a timestamp for requests from
    particular device MAC addresses (6 bytes, presented as xx:xx:xx:xx:xx:xx
    where x is a hex digit), ideally in a dynamic data structure. Additionally I
    need to be able to access the data quickly, hence my thoughts to use a
    hashing function.

    This may be naive but I'd thought to code something along the line of a tree
    structure, with each octet representing a node holding 256 pointers to the
    next set of nodes (from most significant to least significant octect) with
    the last octect holding pointers to a simple record holding the data.

    This would mean that at most i would need to traverse 6 tree nodes before
    hitting the data whatever the volume of addresses.(?)

    Does this seem feasible?

    If so (and this is where I get unstuck :eek:)), in C i'd have a structure
    defined as an array of 255 pointers, then allocate space for one of these
    structures as I parse each octect of the mac address. I'm having difficulty
    in defining the structures correctly so that the last but one octet
    references the actual data rather than another set of node pointers.

    Can someone please advise on the best way to do this? Or if you think the
    above is complete pap suggest any easier or more sensible ( please :p)
    solutions. I'm open to advice/flaming/suggestions!

    Cheers,
    Matt
     
    Matt Bull, Jul 7, 2003
    #1
    1. Advertising

  2. Matt Bull

    Eric Sosman Guest

    Matt Bull wrote:
    >
    > I need to be able to store a counter and a timestamp for requests from
    > particular device MAC addresses (6 bytes, presented as xx:xx:xx:xx:xx:xx
    > where x is a hex digit), ideally in a dynamic data structure. Additionally I
    > need to be able to access the data quickly, hence my thoughts to use a
    > hashing function.
    >
    > This may be naive but I'd thought to code something along the line of a tree
    > structure, with each octet representing a node holding 256 pointers to the
    > next set of nodes (from most significant to least significant octect) with
    > the last octect holding pointers to a simple record holding the data.


    <off-topic>

    This approach would probably waste a lot of memory. For
    example, suppose your entire data base contains just two six-byte
    keys that happen to differ in the first byte. You'll allocate
    one 256-entry node for the first byte and two for each of the
    second through sixth bytes -- eleven 256-entry nodes in all, just
    for twelve bytes of key data?

    Incidentally, the scheme you describe is known as a "trie,"
    and doesn't sound like a hashing method at all.

    </off-topic>

    > If so (and this is where I get unstuck :eek:)), in C i'd have a structure
    > defined as an array of 255 pointers,


    Don't you mean 256?

    > then allocate space for one of these
    > structures as I parse each octect of the mac address. I'm having difficulty
    > in defining the structures correctly so that the last but one octet
    > references the actual data rather than another set of node pointers.


    The first step is to define your "payload," the data
    structure you'll find at the eventual end of whatever search
    path you'll take. A typical way to do this in C is with a
    struct, something like

    typedef struct {
    TimeStampType when;
    CounterType count;
    OtherType other;
    } Payload;

    Next, you need something capable of pointing to 256
    different Payloads. An array of pointers is just the trick:

    typedef Payload *Level6[256];

    That is, a `Level6' object is an array of 256 pointers to
    `Payload' objects. At the next level you'll need a bunch of
    pointers to Level6's, and so on up to the Level1 root:

    typedef Level6 *Level5[256];
    typedef Level5 *Level4[256];
    typedef Level4 *Level3[256];
    typedef Level3 *Level2[256];
    typedef Level2 *Level1[256];

    /* finally, here's the root of your trie: */
    Level1 root;

    .... and there you have it: The lone Level1 object points to 256
    Level2's, each Level2 points to 256 Level3's, and so on until
    the chosen Level6 object points to the desired Payload.

    However, this is an incredibly verbose way to proceed! The
    novice C programmer is usually comfortable with the idea of a
    "pointer to something" but starts to get queasy when confronted
    with a "pointer to pointer to something." The LevelN types have
    no real function other than to assign names to various "somethings"
    so the novice can visualize them more easily -- throw away all
    those names and you get the exact same structure, expressed in
    terms of multiple levels of pointers:

    Payload *****root[256];

    (Actually, there's a subtle difference between the two
    formulations: the LevelN version carries the "256-ness" information
    along while the latter version doesn't, and this will affect the way
    you calculate how much memory to allocate for each node. Still,
    the essential structure of a six-level trie is the same.)

    > Can someone please advise on the best way to do this? Or if you think the
    > above is complete pap suggest any easier or more sensible ( please :p)
    > solutions. I'm open to advice/flaming/suggestions!


    <off-topic>

    Choice of data structure and search/update method really hasn't
    anything to do with the C language. Once a choice has been made,
    the question of how best to express the solution in C would be
    topical, but the selection itself is driven by criteria that would
    be just as valid (or invalid) for Fortran, Java, BASIC, COBOL, ...

    Also, you haven't revealed enough about your situation to allow
    anything more than informed guesswork. For example, how many of
    these six-byte keys do you need to store? You say you need to
    access them "quickly," but you don't mention how many million queries
    per second you need to sustain. Once entered, do the keys persist
    forever or can they be deleted? And so on, and so forth -- and
    NONE of these apparently vital considerations has anything at all
    to do with C. Look elsewhere for advice, and return here if you're
    having difficulty with coding the chosen solution in C.

    </off-topic>

    Good luck!

    --
     
    Eric Sosman, Jul 7, 2003
    #2
    1. Advertising

  3. Matt Bull

    Matt Bull Guest

    Re: Hashing function (Definately OT)

    "Eric Sosman" <> wrote in message
    news:...
    > Matt Bull wrote:
    > >
    > > I need to be able to store a counter and a timestamp for requests from
    > > particular device MAC addresses (6 bytes, presented as xx:xx:xx:xx:xx:xx
    > > where x is a hex digit), ideally in a dynamic data structure.

    Additionally I
    > > need to be able to access the data quickly, hence my thoughts to use a
    > > hashing function.
    > >
    > > This may be naive but I'd thought to code something along the line of a

    tree
    > > structure, with each octet representing a node holding 256 pointers to

    the
    > > next set of nodes (from most significant to least significant octect)

    with
    > > the last octect holding pointers to a simple record holding the data.

    >
    > <off-topic>
    >
    > This approach would probably waste a lot of memory. For
    > example, suppose your entire data base contains just two six-byte
    > keys that happen to differ in the first byte. You'll allocate
    > one 256-entry node for the first byte and two for each of the
    > second through sixth bytes -- eleven 256-entry nodes in all, just
    > for twelve bytes of key data?
    >
    > Incidentally, the scheme you describe is known as a "trie,"
    > and doesn't sound like a hashing method at all.


    I've analysed most of the data we are processing at the moment and the keys
    would be in quite a low number of most significant byte "chunks", where the
    vast majority of difference lies in lower 3 octets. So it might be ok.

    This method seemed to offer the best of both ease of programming and memory
    use at the time, although I'm open to suggestions for another method that
    wouldn't make my brain melt :eek:)

    > </off-topic>
    >
    > > If so (and this is where I get unstuck :eek:)), in C i'd have a structure
    > > defined as an array of 255 pointers,

    >
    > Don't you mean 256?


    Yes I did! hehe

    > The first step is to define your "payload," the data
    > structure you'll find at the eventual end of whatever search
    > path you'll take. A typical way to do this in C is with a
    > struct, something like


    <snipped good stuff>
    >
    > (Actually, there's a subtle difference between the two
    > formulations: the LevelN version carries the "256-ness" information
    > along while the latter version doesn't, and this will affect the way
    > you calculate how much memory to allocate for each node. Still,
    > the essential structure of a six-level trie is the same.)
    >


    Ahhh thas excellent, just the info I was after!

    > > Can someone please advise on the best way to do this? Or if you think

    the
    > > above is complete pap suggest any easier or more sensible ( please :p)
    > > solutions. I'm open to advice/flaming/suggestions!

    >
    > <off-topic>
    >
    > Choice of data structure and search/update method really hasn't
    > anything to do with the C language. Once a choice has been made,
    > the question of how best to express the solution in C would be
    > topical, but the selection itself is driven by criteria that would
    > be just as valid (or invalid) for Fortran, Java, BASIC, COBOL, ...
    >


    Yes, apologies again for the off topic sway of the query. I'm somewhat stuck
    with implementing in C as thats the only supported API the application this
    code must interact with supports.

    > Also, you haven't revealed enough about your situation to allow
    > anything more than informed guesswork. For example, how many of
    > these six-byte keys do you need to store? You say you need to
    > access them "quickly," but you don't mention how many million queries
    > per second you need to sustain. Once entered, do the keys persist
    > forever or can they be deleted? And so on, and so forth -- and
    > NONE of these apparently vital considerations has anything at all
    > to do with C. Look elsewhere for advice, and return here if you're
    > having difficulty with coding the chosen solution in C.
    >


    Well, at the moment it needs to retain around 100,000 individual "payloads",
    persisting in memory until the application is terminated, but generally only
    seeing 100-500 new or duplicated keys per second.

    > </off-topic>
    >
    > Good luck!
    >


    Thanks very much for the information Eric, you've been very helpful!

    I'll let you know how I get on.

    :eek:)

    Cheers,
    Matt
     
    Matt Bull, Jul 7, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Owen Jacobson
    Replies:
    3
    Views:
    4,767
    CBFalconer
    May 26, 2005
  2. Pat

    Hashing function

    Pat, May 18, 2004, in forum: C++
    Replies:
    2
    Views:
    500
    Ivan Vecerina
    May 18, 2004
  3. Dave

    Hashing Function

    Dave, Nov 17, 2005, in forum: Python
    Replies:
    0
    Views:
    342
  4. Lawrence
    Replies:
    16
    Views:
    577
  5. January Weiner

    Which hashing function to use?

    January Weiner, Nov 16, 2007, in forum: C Programming
    Replies:
    11
    Views:
    538
    Paul Hsieh
    Nov 20, 2007
Loading...

Share This Page