Need a reliable way for faster lookup.

Discussion in 'C++' started by Sudarshan Narasimhan, Oct 7, 2009.

  1. All,

    I am developing a module on a telecom box. In my application, i have a
    requirement for building a fault management framework, i used std::map
    to create a generic fault table which accesses fault objects by using
    a key(string).

    Now, i am being asked to come up with a much faster lookup better than
    O(log n). The thing with the key for the search is that most of them
    have a numeric background to them. For eg. 1-A-1-T1-1 (which
    corresponds to a specific port on a card, on a slot, on a chassis
    etc). So i thought i could define an array of pointers(statically) and
    manage this.

    But the problem is that, i would have to write a search function for
    every table(because the above object is just one type, there are many
    such objects of different formats, hence the array would have to
    defined and interpreted differently). Also, for some tables, there may
    be strings in them which do not have a numeric pattern as above.

    Do i need to use a hash function for the above? OR do you have any
    other alternatives in mind? Can you refer me to a standard hash
    function(havent done a has earlier) or give idea for one? The good
    thing is that if i implement a hash function, i can use it generically
    on any string, so i would have to implement it only in the base class
    and let any table which inherits from the generic table use this
    method.

    Thanks in advance,
    Sud
    Sudarshan Narasimhan, Oct 7, 2009
    #1
    1. Advertising

  2. On Oct 7, 11:19 am, Sudarshan Narasimhan <>
    wrote:
    > All,
    >
    > I am developing a module on a telecom box. In my application, i have a
    > requirement for building a fault management framework, i used std::map
    > to create a generic fault table which accesses fault objects by using
    > a key(string).
    >
    > Now, i am being asked to come up with a much faster lookup better than
    > O(log n). The thing with the key for the search is that most of them
    > have a numeric background to them. For eg. 1-A-1-T1-1 (which
    > corresponds to a specific port on a card, on a slot, on a chassis
    > etc). So i thought i could define an array of pointers(statically) and
    > manage this.
    >
    > But the problem is that, i would have to write a search function for
    > every table(because the above object is just one type, there are many
    > such objects of different formats, hence the array would have to
    > defined and interpreted differently). Also, for some tables, there may
    > be strings in them which do not have a numeric pattern as above.
    >
    > Do i need to use a hash function for the above? OR do you have any
    > other alternatives in mind? Can you refer me to a standard hash
    > function(havent done a has earlier) or give idea for one? The good
    > thing is that if i implement a hash function, i can use it generically
    > on any string, so i would have to implement it only in the base class
    > and let any table which inherits from the generic table use this
    > method.
    >
    > Thanks in advance,
    > Sud


    I need to mention here that the number of objects in a table is not
    huge. Its in the order of a few tens to a hundred. Not sure if the
    hash function would work fine in such a case with out many collisions
    because i have to do mod(NUM of Objects) to lookup an array for the
    same.
    Sudarshan Narasimhan, Oct 7, 2009
    #2
    1. Advertising

  3. On 7 oct, 08:42, Sudarshan Narasimhan <> wrote:
    > On Oct 7, 11:19 am, Sudarshan Narasimhan <>
    > wrote:
    > > I am developing a module on a telecom box. In my application, i have a
    > > requirement for building a fault management framework, i used std::map
    > > to create a generic fault table which accesses fault objects by using
    > > a key(string).

    >
    > > Now, i am being asked to come up with a much faster lookup better than
    > > O(log n).


    For 100 elements, this make an average of around 5 comparisons.

    > > The thing with the key for the search is that most of them
    > > have a numeric background to them. For eg. 1-A-1-T1-1 (which
    > > corresponds to a specific port on a card, on a slot, on a chassis
    > > etc). So i thought i could define an array of pointers(statically) and
    > > manage this.


    Pointers to what ?

    > > But the problem is that, i would have to write a search function for
    > > every table(because the above object is just one type, there are many
    > > such objects of different formats, hence the array would have to
    > > defined and interpreted differently). Also, for some tables, there may
    > > be strings in them which do not have a numeric pattern as above.

    >
    > > Do i need to use a hash function for the above? OR do you have any
    > > other alternatives in mind?


    std::tr1::unordered_map or non-standard hash_map.

    > > Can you refer me to a standard hash
    > > function(havent done a has earlier) or give idea for one?


    If you know all the possible names beforehand, you can generate a
    perfect hash (with gperf by example).

    Otherwise, I've had good results with Dan Bernstein hash (the new
    version).

    > > The good
    > > thing is that if i implement a hash function, i can use it generically
    > > on any string, so i would have to implement it only in the base class
    > > and let any table which inherits from the generic table use this
    > > method.

    >
    > I need to mention here that the number of objects in a table is not
    > huge. Its in the order of a few tens to a hundred. Not sure if the
    > hash function would work fine in such a case with out many collisions
    > because i have to do mod(NUM of Objects) to lookup an array for the
    > same.


    The less collision the better. Afterward, it is a matter of how many
    bucket you can afford.

    --
    Michael
    Michael Doubez, Oct 7, 2009
    #3
  4. Sudarshan Narasimhan

    James Kanze Guest

    On Oct 7, 8:57 am, Michael Doubez <> wrote:
    > On 7 oct, 08:42, Sudarshan Narasimhan <> wrote:


    > > On Oct 7, 11:19 am, Sudarshan Narasimhan
    > > <> wrote:
    > > > I am developing a module on a telecom box. In my
    > > > application, i have a requirement for building a fault
    > > > management framework, i used std::map to create a generic
    > > > fault table which accesses fault objects by using a
    > > > key(string).


    > > > Now, i am being asked to come up with a much faster lookup
    > > > better than O(log n).


    > For 100 elements, this make an average of around 5 comparisons.


    And for 1000, about 10.

    My own benchmarks show that, under the particular conditions I
    ran them (Sun Sparc, my data sets, etc.), the best hash tables
    start to outperform std::map somewhere between 150 and 200
    elements, but that the difference doesn't become significant
    until the tables start to be even larger.

    > > > The thing with the key for the search is that most of them
    > > > have a numeric background to them. For eg. 1-A-1-T1-1
    > > > (which corresponds to a specific port on a card, on a
    > > > slot, on a chassis etc). So i thought i could define an
    > > > array of pointers(statically) and manage this.


    > Pointers to what ?


    I'm not sure I've understood what he means exactly either, but
    if the issue is that his key's tend to start with common
    prefixes (which significantly increases the time required for
    each comparison), then simply starting the comparison from the
    end might be a simple fix.

    > > > But the problem is that, i would have to write a search
    > > > function for every table(because the above object is just
    > > > one type, there are many such objects of different
    > > > formats, hence the array would have to defined and
    > > > interpreted differently). Also, for some tables, there may
    > > > be strings in them which do not have a numeric pattern as
    > > > above.


    > > > Do i need to use a hash function for the above? OR do you
    > > > have any other alternatives in mind?


    > std::tr1::unordered_map or non-standard hash_map.


    > > > Can you refer me to a standard hash function(havent done a
    > > > has earlier) or give idea for one?


    > If you know all the possible names beforehand, you can
    > generate a perfect hash (with gperf by example).


    > Otherwise, I've had good results with Dan Bernstein hash (the
    > new version).


    The Bernstein hash does tend to work fairly well, although it
    has some known weaknesses. Using 31 as a multiplier (instead of
    33), as does Java, gives similar results---maybe slightly
    better, but you really want a larger multiplier to avoid
    collisions when very short strings are used to index into a very
    large table. (For a string with two ASCII characters, the
    Bernstein hash will always result in a value less than 4318.)
    I've found 127 to work well, and FNV is also highly recommended.
    (It tends to be slow on machines with slow multiplication, like
    a Sparc, but on an Intel, that shouldn't be a problem.)

    > > > The good thing is that if i implement a hash function, i
    > > > can use it generically on any string, so i would have to
    > > > implement it only in the base class and let any table
    > > > which inherits from the generic table use this method.


    > > I need to mention here that the number of objects in a table
    > > is not huge. Its in the order of a few tens to a hundred.
    > > Not sure if the hash function would work fine in such a case
    > > with out many collisions because i have to do mod(NUM of
    > > Objects) to lookup an array for the same.


    > The less collision the better. Afterward, it is a matter of
    > how many bucket you can afford.


    There are two issues: the number of collisions, and the speed it
    takes to calculate the hash code. Using a CRC will generally
    result in a very low collision rate, but when you factor in the
    cost of calculating it, you loose any advantage over using one
    of the multiple and add/xor algorithms (which only have very few
    more collisions, and cost significantly less to calculate). If
    the table only contains a few tens to a hundred entries,
    however, I doubt that a hash table will outperform an std::map
    with an "optimized" comparison function. ("Optimized" according
    to the structure of the key strings.)

    For some (possibly outdated) results of the actual benchmarks I
    ran some years ago, see
    http://kanze.james.neuf.fr/code/Docs/html/Hashcode.html. (My
    site seems to be accessible again.) I've been meaning to update
    this for sometime, but haven't gotten around to it---still, for
    the data set consisting of 75 URL's, std::map is faster than any
    of the hash codes, even though most of the strings contain a
    common prefix ("http://"). (Note that when I did this, I wasn't
    aware of the Bernstein hash. Later measures, however, show no
    significant difference between it and the Java hash. I've not
    tried the modified Bernstein hash, or for that matter, changing
    addition to xor in any of the hashes which originally use
    addition. Another thing for the TODO list.)

    --
    James Kanze
    James Kanze, Oct 8, 2009
    #4
  5. Michael Doubez <> writes:

    > On 7 oct, 08:42, Sudarshan Narasimhan <> wrote:
    >> On Oct 7, 11:19 am, Sudarshan Narasimhan <>
    >> wrote:
    >> > I am developing a module on a telecom box. In my application, i have a
    >> > requirement for building a fault management framework, i used std::map
    >> > to create a generic fault table which accesses fault objects by using
    >> > a key(string).

    >>
    >> > Now, i am being asked to come up with a much faster lookup better than
    >> > O(log n).

    >
    > For 100 elements, this make an average of around 5 comparisons.


    It hardly looks like it is worthwhile to use a hash indeed, even if
    these comparisons are key comparisons, that is, string comparisons
    (and given the described homogeneity of these strings, they be rather
    costly).


    However, if the set of keys is predetermined (from the description it
    looks like all possible keys could be enumerated fairly early, and
    wouldn't change often), then you could use a perfect hash. The
    generation of the perfect hash function may be costly, but will be
    amortized on usage or even done at compilation time if the keys don't
    change at run-time. The generated perfect hash function may well need
    to do less than 5 character comparison.

    --
    __Pascal Bourguignon__
    Pascal J. Bourguignon, Oct 8, 2009
    #5
  6. Sudarshan Narasimhan

    Bill Davy Guest

    "Pascal J. Bourguignon" <> wrote in message
    news:...
    Michael Doubez <> writes:

    > On 7 oct, 08:42, Sudarshan Narasimhan <> wrote:
    >> On Oct 7, 11:19 am, Sudarshan Narasimhan <>
    >> wrote:
    >> > I am developing a module on a telecom box. In my application, i have a
    >> > requirement for building a fault management framework, i used std::map
    >> > to create a generic fault table which accesses fault objects by using
    >> > a key(string).

    >>
    >> > Now, i am being asked to come up with a much faster lookup better than
    >> > O(log n).

    >
    > For 100 elements, this make an average of around 5 comparisons.


    > It hardly looks like it is worthwhile to use a hash indeed, even if
    > these comparisons are key comparisons, that is, string comparisons
    > (and given the described homogeneity of these strings, they be rather
    > costly).
    >
    >
    > However, if the set of keys is predetermined (from the description it
    > looks like all possible keys could be enumerated fairly early, and
    > wouldn't change often), then you could use a perfect hash. The
    > generation of the perfect hash function may be costly, but will be
    > amortized on usage or even done at compilation time if the keys don't
    > change at run-time. The generated perfect hash function may well need
    > to do less than 5 character comparison.
    >
    > --
    > __Pascal Bourguignon__


    Or just build a tree. Character comparisons to the length of the string.
    Bill Davy, Oct 8, 2009
    #6
  7. On 8 oct, 14:43, "Bill Davy" <> wrote:
    > "Pascal J. Bourguignon" <> wrote in messagenews:...
    >
    >
    >
    > Michael Doubez <> writes:
    > > On 7 oct, 08:42, Sudarshan Narasimhan <> wrote:
    > >> On Oct 7, 11:19 am, Sudarshan Narasimhan <>
    > >> wrote:
    > >> > I am developing a module on a telecom box. In my application, i have a
    > >> > requirement for building a fault management framework, i used std::map
    > >> > to create a generic fault table which accesses fault objects by using
    > >> > a key(string).

    >
    > >> > Now, i am being asked to come up with a much faster lookup better than
    > >> > O(log n).

    [snip]
    > Or just build a tree.  Character comparisons to the length of the string.


    Yes a trie (prefix-tree) might have better performances.

    --
    Michael
    Michael Doubez, Oct 8, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jon Vitar
    Replies:
    6
    Views:
    670
    Priyanka
    Jul 14, 2005
  2. john_c
    Replies:
    11
    Views:
    1,546
    =?Utf-8?B?TWlsb3N6IFNrYWxlY2tpIFtNQ0FEXQ==?=
    Feb 22, 2007
  3. Replies:
    3
    Views:
    120
  4. Rasmus Villemoes

    Which is faster - hash or array lookup

    Rasmus Villemoes, Jan 26, 2009, in forum: Perl Misc
    Replies:
    10
    Views:
    190
  5. Andrew Poulos

    reliable way to get element by name

    Andrew Poulos, Aug 5, 2006, in forum: Javascript
    Replies:
    1
    Views:
    85
    Martin Honnen
    Aug 5, 2006
Loading...

Share This Page