Need a reliable way for faster lookup.

  • Thread starter Sudarshan Narasimhan
  • Start date

S

Sudarshan Narasimhan

All,

I am developing a module on a telecom box. In my application, i have a
requirement for building a fault management framework, i used std::map
to create a generic fault table which accesses fault objects by using
a key(string).

Now, i am being asked to come up with a much faster lookup better than
O(log n). The thing with the key for the search is that most of them
have a numeric background to them. For eg. 1-A-1-T1-1 (which
corresponds to a specific port on a card, on a slot, on a chassis
etc). So i thought i could define an array of pointers(statically) and
manage this.

But the problem is that, i would have to write a search function for
every table(because the above object is just one type, there are many
such objects of different formats, hence the array would have to
defined and interpreted differently). Also, for some tables, there may
be strings in them which do not have a numeric pattern as above.

Do i need to use a hash function for the above? OR do you have any
other alternatives in mind? Can you refer me to a standard hash
function(havent done a has earlier) or give idea for one? The good
thing is that if i implement a hash function, i can use it generically
on any string, so i would have to implement it only in the base class
and let any table which inherits from the generic table use this
method.

Thanks in advance,
Sud
 
Ad

Advertisements

S

Sudarshan Narasimhan

All,

I am developing a module on a telecom box. In my application, i have a
requirement for building a fault management framework, i used std::map
to create a generic fault table which accesses fault objects by using
a key(string).

Now, i am being asked to come up with a much faster lookup better than
O(log n). The thing with the key for the search is that most of them
have a numeric background to them. For eg. 1-A-1-T1-1 (which
corresponds to a specific port on a card, on a slot, on a chassis
etc). So i thought i could define an array of pointers(statically) and
manage this.

But the problem is that, i would have to write a search function for
every table(because the above object is just one type, there are many
such objects of different formats, hence the array would have to
defined and interpreted differently). Also, for some tables, there may
be strings in them which do not have a numeric pattern as above.

Do i need to use a hash function for the above? OR do you have any
other alternatives in mind? Can you refer me to a standard hash
function(havent done a has earlier) or give idea for one? The good
thing is that if i implement a hash function, i can use it generically
on any string, so i would have to implement it only in the base class
and let any table which inherits from the generic table use this
method.

Thanks in advance,
Sud

I need to mention here that the number of objects in a table is not
huge. Its in the order of a few tens to a hundred. Not sure if the
hash function would work fine in such a case with out many collisions
because i have to do mod(NUM of Objects) to lookup an array for the
same.
 
M

Michael Doubez

For 100 elements, this make an average of around 5 comparisons.

Pointers to what ?

std::tr1::unordered_map or non-standard hash_map.

If you know all the possible names beforehand, you can generate a
perfect hash (with gperf by example).

Otherwise, I've had good results with Dan Bernstein hash (the new
version).
I need to mention here that the number of objects in a table is not
huge. Its in the order of a few tens to a hundred. Not sure if the
hash function would work fine in such a case with out many collisions
because i have to do mod(NUM of Objects) to lookup an array for the
same.

The less collision the better. Afterward, it is a matter of how many
bucket you can afford.
 
J

James Kanze

For 100 elements, this make an average of around 5 comparisons.

And for 1000, about 10.

My own benchmarks show that, under the particular conditions I
ran them (Sun Sparc, my data sets, etc.), the best hash tables
start to outperform std::map somewhere between 150 and 200
elements, but that the difference doesn't become significant
until the tables start to be even larger.
Pointers to what ?

I'm not sure I've understood what he means exactly either, but
if the issue is that his key's tend to start with common
prefixes (which significantly increases the time required for
each comparison), then simply starting the comparison from the
end might be a simple fix.
std::tr1::unordered_map or non-standard hash_map.
If you know all the possible names beforehand, you can
generate a perfect hash (with gperf by example).
Otherwise, I've had good results with Dan Bernstein hash (the
new version).

The Bernstein hash does tend to work fairly well, although it
has some known weaknesses. Using 31 as a multiplier (instead of
33), as does Java, gives similar results---maybe slightly
better, but you really want a larger multiplier to avoid
collisions when very short strings are used to index into a very
large table. (For a string with two ASCII characters, the
Bernstein hash will always result in a value less than 4318.)
I've found 127 to work well, and FNV is also highly recommended.
(It tends to be slow on machines with slow multiplication, like
a Sparc, but on an Intel, that shouldn't be a problem.)
The less collision the better. Afterward, it is a matter of
how many bucket you can afford.

There are two issues: the number of collisions, and the speed it
takes to calculate the hash code. Using a CRC will generally
result in a very low collision rate, but when you factor in the
cost of calculating it, you loose any advantage over using one
of the multiple and add/xor algorithms (which only have very few
more collisions, and cost significantly less to calculate). If
the table only contains a few tens to a hundred entries,
however, I doubt that a hash table will outperform an std::map
with an "optimized" comparison function. ("Optimized" according
to the structure of the key strings.)

For some (possibly outdated) results of the actual benchmarks I
ran some years ago, see
http://kanze.james.neuf.fr/code/Docs/html/Hashcode.html. (My
site seems to be accessible again.) I've been meaning to update
this for sometime, but haven't gotten around to it---still, for
the data set consisting of 75 URL's, std::map is faster than any
of the hash codes, even though most of the strings contain a
common prefix ("http://"). (Note that when I did this, I wasn't
aware of the Bernstein hash. Later measures, however, show no
significant difference between it and the Java hash. I've not
tried the modified Bernstein hash, or for that matter, changing
addition to xor in any of the hashes which originally use
addition. Another thing for the TODO list.)
 
P

Pascal J. Bourguignon

Michael Doubez said:
For 100 elements, this make an average of around 5 comparisons.

It hardly looks like it is worthwhile to use a hash indeed, even if
these comparisons are key comparisons, that is, string comparisons
(and given the described homogeneity of these strings, they be rather
costly).


However, if the set of keys is predetermined (from the description it
looks like all possible keys could be enumerated fairly early, and
wouldn't change often), then you could use a perfect hash. The
generation of the perfect hash function may be costly, but will be
amortized on usage or even done at compilation time if the keys don't
change at run-time. The generated perfect hash function may well need
to do less than 5 character comparison.
 
B

Bill Davy

Michael Doubez said:
For 100 elements, this make an average of around 5 comparisons.
It hardly looks like it is worthwhile to use a hash indeed, even if
these comparisons are key comparisons, that is, string comparisons
(and given the described homogeneity of these strings, they be rather
costly).


However, if the set of keys is predetermined (from the description it
looks like all possible keys could be enumerated fairly early, and
wouldn't change often), then you could use a perfect hash. The
generation of the perfect hash function may be costly, but will be
amortized on usage or even done at compilation time if the keys don't
change at run-time. The generated perfect hash function may well need
to do less than 5 character comparison.

Or just build a tree. Character comparisons to the length of the string.
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top