Perfomance Considerations

Alexander Adam · Nov 20, 2007

Hi,

What I am having is this -- I got a string start (type is wchar_t) and
the end of a string to compare. Now I've got around 500 words to
compare to and if matched, I want to get an unique integer for the
word. Currently this is done by filling up a std::map<wstring, int>
with all words and doing an: iterator it = my_map.find(wchar_t* value)
and return it->second if found, otherwise return 0. Now this routine
takes pretty much time and I was wondering whether doing a manual list
of

if (strncmp(...) == 0)
...
else if (strncmp(...) == 0)
...

would be faster / more efficient? I was also thinking about doing
something like

switch (*str)
{
case 'a':
if (strncmp(...))
}

i.e. first compaing the first char and then do some deeper evaluation
to avoid calling the strncmp subroutine but I doubt that this even
makes sense.

Any idea so far or any closer description required? Basically as said,
I am having a wchar_t* str_start and an wchar_t* str_end which I want
to compare to predefined words and match an unique integer code for
it.

Regards
Alexander

red floyd · Nov 20, 2007

Alexander said:
Hi,

What I am having is this -- I got a string start (type is wchar_t) and
the end of a string to compare. Now I've got around 500 words to
compare to and if matched, I want to get an unique integer for the
word. Currently this is done by filling up a std::map<wstring, int>
with all words and doing an: iterator it = my_map.find(wchar_t* value)
and return it->second if found, otherwise return 0. Now this routine
takes pretty much time and I was wondering whether doing a manual list
of

if (strncmp(...) == 0)
..
else if (strncmp(...) == 0)
..

would be faster / more efficient? I was also thinking about doing
something like

switch (*str)
{
case 'a':
if (strncmp(...))
}

i.e. first compaing the first char and then do some deeper evaluation
to avoid calling the strncmp subroutine but I doubt that this even
makes sense.

Any idea so far or any closer description required? Basically as said,
I am having a wchar_t* str_start and an wchar_t* str_end which I want
to compare to predefined words and match an unique integer code for
it.

This looks like a clear case of Hoare's Dictum. Have you even
determined if this is your bottleneck? Have you looked into alternate
datastructures and algorithms? Is there a reason your're using strncmp
instead of wcsncmp (since you're dealing with wchar_t)? Why don't you
convert your wchar_t array to a wstring?

There's lots of stuff to be done up front before you worry about this
sort of micro-optimization.

Andre Kostur · Nov 20, 2007

Hi,

What I am having is this -- I got a string start (type is wchar_t) and
the end of a string to compare. Now I've got around 500 words to
compare to and if matched, I want to get an unique integer for the
word. Currently this is done by filling up a std::map<wstring, int>
with all words and doing an: iterator it = my_map.find(wchar_t* value)
and return it->second if found, otherwise return 0. Now this routine
takes pretty much time and I was wondering whether doing a manual list
of

if (strncmp(...) == 0)
..
else if (strncmp(...) == 0)
..

would be faster / more efficient? I was also thinking about doing

As Red Floyd has mentioned... profile first. However, how would you expect
a linear search to be able to beat the tree search (except for certain
degenerate cases)? Unless you have some statistical knowledge of how often
the words occur, the map will use on average 9 comparisons where your
linear search will use 250 (If you happen to know that 1 particular word
occurs 99.9% of the time, the linear search would kick the tree's butt if
you look for that word first. Then again, you could look for that word
first, then search the map if it doesn't match.). I would think that this
change would be going backwards in terms of performance. (But of course
you'd have to measure it to be sure). Perhaps a hashing container would
net you some benefits (see std::tr1::unordered_map<>). Depends on how
expensive the hashing function is I suppose.

Marco Manfredini · Nov 20, 2007

Alexander said:
Hi,

What I am having is this -- I got a string start (type is wchar_t) and
the end of a string to compare. Now I've got around 500 words to
compare to and if matched, I want to get an unique integer for the
word. Currently this is done by filling up a std::map<wstring, int>
with all words and doing an: iterator it = my_map.find(wchar_t* value)
and return it->second if found, otherwise return 0. Now this routine
takes pretty much time and I was wondering whether doing a manual list
of

if (strncmp(...) == 0)
..
else if (strncmp(...) == 0)
..

No, because on average you have to make 250 compares per word, while
std::map does 8 or 9. If some of your search words show up more often
than others then you might sort the compares, but that's not really
helping you.

would be faster / more efficient? I was also thinking about doing
something like

switch (*str)
{
case 'a':
if (strncmp(...))
}

Yes that's better. I don't want to bother you with further hints on how
to improve this further, or if you should use hash_maps instead. If
(what I guess from your suggestion to strcmp everything) you have a
*fixed* number of words that you would like to recognize, then a very
good bet is a code generator which turns your wordlist into a piece of
code to recognize these words as fast a possible.
I'm quite happy with re2c (http://re2c.org/), which, with its default
settings, basically does the switch-thing, alas to the bitter end. And
it supports wide-characters.

Some perfomance thoughts required	4	Jun 20, 2008
Instantiate a basic_string with a text value independent of char or wchar_t	4	Jun 18, 2013
Strings as non-type template parameters	11	May 31, 2010
Ambiguity by making overloaded operator function-const - why?	7	Jun 1, 2008
Converting EBCDIC to Unicode	3	Sep 28, 2010
Algorithm considerations	10	Feb 26, 2007
Tic Tac Toe Game	2	Mar 10, 2024
Python, Linux, and the setuid bit	8	Apr 14, 2014

Perfomance Considerations

Alexander Adam

red floyd

Andre Kostur

Marco Manfredini

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads