Dual search data structure

Discussion in 'C++' started by Divick, Nov 18, 2005.

  1. Divick

    Divick Guest

    Hi all,
    does any one know of any data structure in which I can search
    the key and value equally fast? Though I could use hashes, but I can
    only search in constant time on key but not on value. If I want to
    search the value then I will have to do a linear search.
    Another related questions I have are
    1. Does any one know a hash function that does not lead to collisions
    and what is it time complexity?
    2. Why is it that to grow the size of an hash array do you need to
    create a bigger array and then rehash the keys in the old array? In my
    views there could be better strategies to grow the array, or may be not
    just grow the array instead use associated pointers to store another
    hash-array which can also contain keys mapped with totally different
    hash function. This way the size of hash can be grown without much
    memory overhead and also re-insertion is not needed. I am not very
    sure that whether this is a standard practice or not. If somebody can
    throw me some links or pointers to similar literature then it would
    help me a lot.

    Any help is appreciated,
    Thanks,
    Divick
     
    Divick, Nov 18, 2005
    #1
    1. Advertising

  2. Divick

    rwf_20 Guest

    Divick wrote:
    > 1. Does any one know a hash function that does not lead to collisions
    > and what is it time complexity?


    There is no such thing. Good hash functions are those for which the
    fastest way to find a collision is to use brute force. For practical
    purposes, any of the famous ones are fine.

    http://en.wikipedia.org/wiki/Cryptographic_hash_function

    Ryan
     
    rwf_20, Nov 18, 2005
    #2
    1. Advertising

  3. Divick

    KeithSpook Guest

    Divick wrote:
    > Hi all,
    > does any one know of any data structure in which I can search
    > the key and value equally fast? Though I could use hashes, but I can
    > only search in constant time on key but not on value. If I want to
    > search the value then I will have to do a linear search.
    >


    I had a similar problem not too long ago... my solution was to wrap a
    std::map<Key, Obj*> and a std::multiset<Obj*, ptr_less<Obj> > in my own
    class. The deal was that I could get the Key from the object, so the
    multiset worked as an index onto the map.

    hth
    ~KS
     
    KeithSpook, Nov 18, 2005
    #3
  4. Divick wrote:
    > Hi all,
    > does any one know of any data structure in which I can search
    > the key and value equally fast?


    If you're talking standard C++, then 'std::set' is the only such structure
    because the value _is_ the key in it. If you're not talking standard C++,
    then you're in a wrong newsgroup. General programming questions should be
    probably addressed to 'comp.programming'.

    > [...]


    V
     
    Victor Bazarov, Nov 18, 2005
    #4
  5. Divick

    Axter Guest

    Divick wrote:
    > Hi all,
    > does any one know of any data structure in which I can search
    > the key and value equally fast?


    You can try using the bimap class in the following link:

    http://www.codeproject.com/vcpp/stl/bimap.asp
     
    Axter, Nov 18, 2005
    #5
  6. Divick

    Greg Guest

    Divick wrote:
    > Hi all,
    > does any one know of any data structure in which I can search
    > the key and value equally fast? Though I could use hashes, but I can
    > only search in constant time on key but not on value. If I want to
    > search the value then I will have to do a linear search.


    It's possible to have one container be the reverse index of the
    contents of another container. The simplest form is to store the items
    in two different containers, each keyed on a different field. Of course
    the approach is not very memory-efficient. So to save memory, the items
    could be placed in a std::set and then store its iterators in one or
    more std::sets, each one keyed to a particular field in the stored
    item. In essense, the containers are serving as "indexes" into a group
    of records.

    Besides the additional overhead incurred with multiple indexes, there
    are additional housekeeping tasks required as well. Deleting items has
    to to delete the iterator from all of the indexes. Similarly, adding a
    new item has to add it to each index container as well. But as long as
    there is one class whose API manages all interaction with the records,
    these requirements are not unmangeable.

    > Another related questions I have are
    > 1. Does any one know a hash function that does not lead to collisions
    > and what is it time complexity?


    The identity hash function would have no collisions since the hash is
    the value of the item itself. Of course hash values tend to be smaller
    than the original value, and any time you have to squeeze the same
    amount of data into smaller space, there will be collisions.

    > 2. Why is it that to grow the size of an hash array do you need to
    > create a bigger array and then rehash the keys in the old array? In my
    > views there could be better strategies to grow the array, or may be not
    > just grow the array instead use associated pointers to store another
    > hash-array which can also contain keys mapped with totally different
    > hash function. This way the size of hash can be grown without much
    > memory overhead and also re-insertion is not needed. I am not very
    > sure that whether this is a standard practice or not. If somebody can
    > throw me some links or pointers to similar literature then it would
    > help me a lot.


    Generally once a hash container starts filling up its buckets it will
    double its allocated size and reinsert the items into the new, roomier
    hash. There's not much to be gained by more elaborate methods to avoid
    rehashing, since the easiest way to avoid it is to specify a reasonable
    estimate of the expected size of the hashed container when it is first
    created.

    Greg
     
    Greg, Nov 18, 2005
    #6
  7. Divick

    Kaz Kylheku Guest

    Divick wrote:
    > Hi all,
    > does any one know of any data structure in which I can search
    > the key and value equally fast? Though I could use hashes, but I can
    > only search in constant time on key but not on value. If I want to
    > search the value then I will have to do a linear search.


    This is done by an augmented data structure: a structure which contains
    two or more structures, maintaining them in parallel.

    Databases are one obvious example. You define a table, and a primary
    key to find records. Additionally, you can define secondary keys. The
    RDBMS software builds separate indexes for all of the keys, and
    maintains those indexes when you insert and remove records.

    A hash table is an index for finding records. For each key, you have to
    build a separate index.

    > Another related questions I have are
    > 1. Does any one know a hash function that does not lead to collisions
    > and what is it time complexity?


    Such functions are called perfect hashing functions. To create a
    perfect hashing function, you have to know all of the keys in advance.
    Then, you have to search for a function which maps all of the keys to
    some integer values without any collisions.

    There is a GNU program called gperf which generates perfect hashing
    functions for a set of strings. Its output is the C code for that
    hashing function.

    How might this be useful? It could be useful in creating a fast look-up
    table for the reserved keywords of a programming language, to use
    within a parser.

    > 2. Why is it that to grow the size of an hash array do you need to
    > create a bigger array and then rehash the keys in the old array?


    This isn't necessary. Only some algorithms require it. In my old hash
    table implementation (in the Kazlib library) this rehashing does not
    happen. The hash table entries remember the hash values of the keys:
    the raw ``unsigned long'' output values from the hashing function, not
    reduced into the table size. The table sizes are powers of two, so
    reducing a hashing value to the table size is just a masking operation:
    taking the N lowest bits. When the table doubles in size, all it means
    that one more bit from each hash value becomes significant. All entries
    which have a 1 in that bit position have to move to a chain in the
    upper half of the table. The ones which have a 0 in that newly revealed
    bit position stay in the same chain. These sister chains are all a
    fixed displacement apart. For instance if the table goes from 16 to 32
    chains, then the odd elements move from chain 0 to chain 8, from 1 to
    9, 2 to 10 ... through 15 to 13. It's a simple loop that just checks
    one bit and flips some link pointers.

    In my
    > views there could be better strategies to grow the array, or may be not
    > just grow the array instead use associated pointers to store another
    > hash-array which can also contain keys mapped with totally different
    > hash function. This way the size of hash can be grown without much
    > memory overhead and also re-insertion is not needed. I am not very
    > sure that whether this is a standard practice or not. If somebody can


    The implied algorithm is not clear from your vague description.

    > throw me some links or pointers to similar literature then it would
    > help me a lot.


    There are lazy ways to grow hash tables. Instead of actually moving the
    elements, you can leave them where they are, but modify the lookup
    algorithm to find them anyway. When searching for an item, if you don't
    find it, then try the sister chain in the bottom half of the table.
    I.e. pretend that the table is still the small one and try that way.
    Keep subdividing recursively. If you find the item, then move it to its
    proper location in the full table. Newly inserted items go to their
    proper locations relative to the new size, of course.
     
    Kaz Kylheku, Nov 19, 2005
    #7
  8. Divick

    g Guest

    !!!!!!! boost multi-index container !!!!!!
     
    g, Nov 19, 2005
    #8
  9. Divick

    Divick Guest

    Is it threadsafe? As far as I know stl is not thread safe. I forgot to
    mention that I need a thread safe implementation.

    Divick
     
    Divick, Nov 21, 2005
    #9
  10. Divick

    Divick Guest

    >>If you're talking standard C++, then 'std::set' is the only such structure
    >>because the value _is_ the key in it. If you're not talking standard C++,
    >>then you're in a wrong newsgroup. General programming questions should be
    >>probably addressed to 'comp.programming'.


    Definitely it is meant for C++ newsgrp. In other languages, like perl
    and lisp I suppose there are data structures which could solve the
    purpose but I want the data structure in C++ only.

    Divick
     
    Divick, Nov 21, 2005
    #10
  11. Divick

    Divick Guest

    >>!!!!!!! boost multi-index container !!!!!!
    Is it thread safe?

    Divick
     
    Divick, Nov 21, 2005
    #11
  12. Divick

    Divick Guest

    Greg,
    thanks for your solution but that was the first solution that I
    thought of but i feel it is ineligant because of too much of book
    keeping and the implementation would be prone to bugs. Above all this
    more than doubles the memory requirements.

    Divick
     
    Divick, Nov 21, 2005
    #12
  13. Divick

    Divick Guest

    Hi Kaz,
    >> Databases are one obvious example. You define a table, and a primary
    >>key to find records ....

    Well I don't want to use data bases in my application as it would incur
    a lot of over head.

    >>This isn't necessary. Only some algorithms require it. In my old hash
    >>table implementation (in the Kazlib library) this rehashing does not ....

    Sorry I don't quite get your description. I would appreciate, if you
    could explain it little more elaborately.

    >>The implied algorithm is not clear from your vague description.

    Well it indeed is really vague. It is just an idea and I have not
    implemented it actually. What I am talking of is a sort of multi-level
    hash data structure where the look up/ insertion/deletion would not be
    average case O(1) but essentially n*T(operation) where n is nth
    increment in size and T(operation) is the time taken to do some
    operation on hash. Another way to describe this kind of data structure
    is multi-level hash, where at the beginning of hash array or end, there
    would be a pointer to another hash if the current hash is almost full.
    This linked hash would be searched if the entry is not in the first
    hash.

    If still the idea is very vague, I would write a corresponding pseudo
    code and then post it. I would ceratainly like to have more feedbacks
    about this still.

    Thanks,
    Divick.
     
    Divick, Nov 21, 2005
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Nicolas Matringe

    Synthesizable (kind of) dual-edge FF

    Nicolas Matringe, Sep 27, 2004, in forum: VHDL
    Replies:
    1
    Views:
    958
    Ralf Hildebrandt
    Oct 12, 2004
  2. Rafal Pietrak

    Dual data rate in Xilinx WebPACK 7.1

    Rafal Pietrak, Feb 28, 2006, in forum: VHDL
    Replies:
    22
    Views:
    1,368
    Mike Treseler
    Mar 9, 2006
  3. Bill  Mill
    Replies:
    1
    Views:
    567
    Diez B. Roggisch
    Aug 17, 2006
  4. A
    Replies:
    27
    Views:
    1,666
    Jorgen Grahn
    Apr 17, 2011
  5. Abby Lee
    Replies:
    5
    Views:
    451
    Abby Lee
    Aug 2, 2004
Loading...

Share This Page