Need a reliable way for faster lookup.

Sudarshan Narasimhan · Oct 7, 2009

All,

I am developing a module on a telecom box. In my application, i have a
requirement for building a fault management framework, i used std::map
to create a generic fault table which accesses fault objects by using
a key(string).

Now, i am being asked to come up with a much faster lookup better than
O(log n). The thing with the key for the search is that most of them
have a numeric background to them. For eg. 1-A-1-T1-1 (which
corresponds to a specific port on a card, on a slot, on a chassis
etc). So i thought i could define an array of pointers(statically) and
manage this.

But the problem is that, i would have to write a search function for
every table(because the above object is just one type, there are many
such objects of different formats, hence the array would have to
defined and interpreted differently). Also, for some tables, there may
be strings in them which do not have a numeric pattern as above.

Do i need to use a hash function for the above? OR do you have any
other alternatives in mind? Can you refer me to a standard hash
function(havent done a has earlier) or give idea for one? The good
thing is that if i implement a hash function, i can use it generically
on any string, so i would have to implement it only in the base class
and let any table which inherits from the generic table use this
method.

Thanks in advance,
Sud

Sudarshan Narasimhan · Oct 7, 2009

All,

I am developing a module on a telecom box. In my application, i have a
requirement for building a fault management framework, i used std::map
to create a generic fault table which accesses fault objects by using
a key(string).

Now, i am being asked to come up with a much faster lookup better than
O(log n). The thing with the key for the search is that most of them
have a numeric background to them. For eg. 1-A-1-T1-1 (which
corresponds to a specific port on a card, on a slot, on a chassis
etc). So i thought i could define an array of pointers(statically) and
manage this.

But the problem is that, i would have to write a search function for
every table(because the above object is just one type, there are many
such objects of different formats, hence the array would have to
defined and interpreted differently). Also, for some tables, there may
be strings in them which do not have a numeric pattern as above.

Do i need to use a hash function for the above? OR do you have any
other alternatives in mind? Can you refer me to a standard hash
function(havent done a has earlier) or give idea for one? The good
thing is that if i implement a hash function, i can use it generically
on any string, so i would have to implement it only in the base class
and let any table which inherits from the generic table use this
method.

Thanks in advance,
Sud

I need to mention here that the number of objects in a table is not
huge. Its in the order of a few tens to a hundred. Not sure if the
hash function would work fine in such a case with out many collisions
because i have to do mod(NUM of Objects) to lookup an array for the
same.

Michael Doubez · Oct 7, 2009

For 100 elements, this make an average of around 5 comparisons.

Pointers to what ?

std::tr1::unordered_map or non-standard hash_map.

If you know all the possible names beforehand, you can generate a
perfect hash (with gperf by example).

Otherwise, I've had good results with Dan Bernstein hash (the new
version).

I need to mention here that the number of objects in a table is not
huge. Its in the order of a few tens to a hundred. Not sure if the
hash function would work fine in such a case with out many collisions
because i have to do mod(NUM of Objects) to lookup an array for the
same.

The less collision the better. Afterward, it is a matter of how many
bucket you can afford.

James Kanze · Oct 8, 2009

For 100 elements, this make an average of around 5 comparisons.

And for 1000, about 10.

My own benchmarks show that, under the particular conditions I
ran them (Sun Sparc, my data sets, etc.), the best hash tables
start to outperform std::map somewhere between 150 and 200
elements, but that the difference doesn't become significant
until the tables start to be even larger.

Pointers to what ?

I'm not sure I've understood what he means exactly either, but
if the issue is that his key's tend to start with common
prefixes (which significantly increases the time required for
each comparison), then simply starting the comparison from the
end might be a simple fix.

std::tr1::unordered_map or non-standard hash_map.

If you know all the possible names beforehand, you can
generate a perfect hash (with gperf by example).

Otherwise, I've had good results with Dan Bernstein hash (the
new version).

The Bernstein hash does tend to work fairly well, although it
has some known weaknesses. Using 31 as a multiplier (instead of
33), as does Java, gives similar results---maybe slightly
better, but you really want a larger multiplier to avoid
collisions when very short strings are used to index into a very
large table. (For a string with two ASCII characters, the
Bernstein hash will always result in a value less than 4318.)
I've found 127 to work well, and FNV is also highly recommended.
(It tends to be slow on machines with slow multiplication, like
a Sparc, but on an Intel, that shouldn't be a problem.)

The less collision the better. Afterward, it is a matter of
how many bucket you can afford.

There are two issues: the number of collisions, and the speed it
takes to calculate the hash code. Using a CRC will generally
result in a very low collision rate, but when you factor in the
cost of calculating it, you loose any advantage over using one
of the multiple and add/xor algorithms (which only have very few
more collisions, and cost significantly less to calculate). If
the table only contains a few tens to a hundred entries,
however, I doubt that a hash table will outperform an std::map
with an "optimized" comparison function. ("Optimized" according
to the structure of the key strings.)

For some (possibly outdated) results of the actual benchmarks I
ran some years ago, see
http://kanze.james.neuf.fr/code/Docs/html/Hashcode.html. (My
site seems to be accessible again.) I've been meaning to update
this for sometime, but haven't gotten around to it---still, for
the data set consisting of 75 URL's, std::map is faster than any
of the hash codes, even though most of the strings contain a
common prefix ("http://"). (Note that when I did this, I wasn't
aware of the Bernstein hash. Later measures, however, show no
significant difference between it and the Java hash. I've not
tried the modified Bernstein hash, or for that matter, changing
addition to xor in any of the hashes which originally use
addition. Another thing for the TODO list.)

Pascal J. Bourguignon · Oct 8, 2009

Michael Doubez said:
For 100 elements, this make an average of around 5 comparisons.

It hardly looks like it is worthwhile to use a hash indeed, even if
these comparisons are key comparisons, that is, string comparisons
(and given the described homogeneity of these strings, they be rather
costly).

However, if the set of keys is predetermined (from the description it
looks like all possible keys could be enumerated fairly early, and
wouldn't change often), then you could use a perfect hash. The
generation of the perfect hash function may be costly, but will be
amortized on usage or even done at compilation time if the keys don't
change at run-time. The generated perfect hash function may well need
to do less than 5 character comparison.

Bill Davy · Oct 8, 2009

Michael Doubez said:
For 100 elements, this make an average of around 5 comparisons.

It hardly looks like it is worthwhile to use a hash indeed, even if
these comparisons are key comparisons, that is, string comparisons
(and given the described homogeneity of these strings, they be rather
costly).

However, if the set of keys is predetermined (from the description it
looks like all possible keys could be enumerated fairly early, and
wouldn't change often), then you could use a perfect hash. The
generation of the perfect hash function may be costly, but will be
amortized on usage or even done at compilation time if the keys don't
change at run-time. The generated perfect hash function may well need
to do less than 5 character comparison.

Or just build a tree. Character comparisons to the length of the string.

Michael Doubez · Oct 8, 2009

[snip]
Or just build a tree. Character comparisons to the length of the string.

Yes a trie (prefix-tree) might have better performances.

Most Reliable Data Scraping Services	0	Feb 16, 2023
Need your help for a young programmer	0	Feb 26, 2024
I need a recommendation for a framework	1	Sep 27, 2023
Optimal way to make a table for large lists	2	Jul 7, 2022
Need help	2	Nov 2, 2023
C++ Programming Searching for Good Way to Learn	2	Mar 3, 2023
Need Advice	10	Dec 9, 2022
Need an if statement	8	Jun 13, 2023

Need a reliable way for faster lookup.

Sudarshan Narasimhan

Sudarshan Narasimhan

Michael Doubez

James Kanze

Pascal J. Bourguignon

Bill Davy

Michael Doubez

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads