Perfomance Considerations

A

Alexander Adam

Hi,

What I am having is this -- I got a string start (type is wchar_t) and
the end of a string to compare. Now I've got around 500 words to
compare to and if matched, I want to get an unique integer for the
word. Currently this is done by filling up a std::map<wstring, int>
with all words and doing an: iterator it = my_map.find(wchar_t* value)
and return it->second if found, otherwise return 0. Now this routine
takes pretty much time and I was wondering whether doing a manual list
of

if (strncmp(...) == 0)
...
else if (strncmp(...) == 0)
...

would be faster / more efficient? I was also thinking about doing
something like

switch (*str)
{
case 'a':
if (strncmp(...))
}

i.e. first compaing the first char and then do some deeper evaluation
to avoid calling the strncmp subroutine but I doubt that this even
makes sense.

Any idea so far or any closer description required? Basically as said,
I am having a wchar_t* str_start and an wchar_t* str_end which I want
to compare to predefined words and match an unique integer code for
it.

Regards
Alexander
 
R

red floyd

Alexander said:
Hi,

What I am having is this -- I got a string start (type is wchar_t) and
the end of a string to compare. Now I've got around 500 words to
compare to and if matched, I want to get an unique integer for the
word. Currently this is done by filling up a std::map<wstring, int>
with all words and doing an: iterator it = my_map.find(wchar_t* value)
and return it->second if found, otherwise return 0. Now this routine
takes pretty much time and I was wondering whether doing a manual list
of

if (strncmp(...) == 0)
..
else if (strncmp(...) == 0)
..

would be faster / more efficient? I was also thinking about doing
something like

switch (*str)
{
case 'a':
if (strncmp(...))
}

i.e. first compaing the first char and then do some deeper evaluation
to avoid calling the strncmp subroutine but I doubt that this even
makes sense.

Any idea so far or any closer description required? Basically as said,
I am having a wchar_t* str_start and an wchar_t* str_end which I want
to compare to predefined words and match an unique integer code for
it.

This looks like a clear case of Hoare's Dictum. Have you even
determined if this is your bottleneck? Have you looked into alternate
datastructures and algorithms? Is there a reason your're using strncmp
instead of wcsncmp (since you're dealing with wchar_t)? Why don't you
convert your wchar_t array to a wstring?

There's lots of stuff to be done up front before you worry about this
sort of micro-optimization.
 
A

Andre Kostur

Hi,

What I am having is this -- I got a string start (type is wchar_t) and
the end of a string to compare. Now I've got around 500 words to
compare to and if matched, I want to get an unique integer for the
word. Currently this is done by filling up a std::map<wstring, int>
with all words and doing an: iterator it = my_map.find(wchar_t* value)
and return it->second if found, otherwise return 0. Now this routine
takes pretty much time and I was wondering whether doing a manual list
of

if (strncmp(...) == 0)
..
else if (strncmp(...) == 0)
..

would be faster / more efficient? I was also thinking about doing

As Red Floyd has mentioned... profile first. However, how would you expect
a linear search to be able to beat the tree search (except for certain
degenerate cases)? Unless you have some statistical knowledge of how often
the words occur, the map will use on average 9 comparisons where your
linear search will use 250 (If you happen to know that 1 particular word
occurs 99.9% of the time, the linear search would kick the tree's butt if
you look for that word first. Then again, you could look for that word
first, then search the map if it doesn't match.). I would think that this
change would be going backwards in terms of performance. (But of course
you'd have to measure it to be sure). Perhaps a hashing container would
net you some benefits (see std::tr1::unordered_map<>). Depends on how
expensive the hashing function is I suppose.
 
M

Marco Manfredini

Alexander said:
Hi,

What I am having is this -- I got a string start (type is wchar_t) and
the end of a string to compare. Now I've got around 500 words to
compare to and if matched, I want to get an unique integer for the
word. Currently this is done by filling up a std::map<wstring, int>
with all words and doing an: iterator it = my_map.find(wchar_t* value)
and return it->second if found, otherwise return 0. Now this routine
takes pretty much time and I was wondering whether doing a manual list
of

if (strncmp(...) == 0)
..
else if (strncmp(...) == 0)
..

No, because on average you have to make 250 compares per word, while
std::map does 8 or 9. If some of your search words show up more often
than others then you might sort the compares, but that's not really
helping you.
would be faster / more efficient? I was also thinking about doing
something like

switch (*str)
{
case 'a':
if (strncmp(...))
}

Yes that's better. I don't want to bother you with further hints on how
to improve this further, or if you should use hash_maps instead. If
(what I guess from your suggestion to strcmp everything) you have a
*fixed* number of words that you would like to recognize, then a very
good bet is a code generator which turns your wordlist into a piece of
code to recognize these words as fast a possible.
I'm quite happy with re2c (http://re2c.org/), which, with its default
settings, basically does the switch-thing, alas to the bitter end. And
it supports wide-characters.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top