Perfomance Considerations

Discussion in 'C++' started by Alexander Adam, Nov 20, 2007.

  1. Hi,

    What I am having is this -- I got a string start (type is wchar_t) and
    the end of a string to compare. Now I've got around 500 words to
    compare to and if matched, I want to get an unique integer for the
    word. Currently this is done by filling up a std::map<wstring, int>
    with all words and doing an: iterator it = my_map.find(wchar_t* value)
    and return it->second if found, otherwise return 0. Now this routine
    takes pretty much time and I was wondering whether doing a manual list
    of

    if (strncmp(...) == 0)
    ...
    else if (strncmp(...) == 0)
    ...

    would be faster / more efficient? I was also thinking about doing
    something like

    switch (*str)
    {
    case 'a':
    if (strncmp(...))
    }

    i.e. first compaing the first char and then do some deeper evaluation
    to avoid calling the strncmp subroutine but I doubt that this even
    makes sense.

    Any idea so far or any closer description required? Basically as said,
    I am having a wchar_t* str_start and an wchar_t* str_end which I want
    to compare to predefined words and match an unique integer code for
    it.

    Regards
    Alexander
    Alexander Adam, Nov 20, 2007
    #1
    1. Advertising

  2. Alexander Adam

    red floyd Guest

    Alexander Adam wrote:
    > Hi,
    >
    > What I am having is this -- I got a string start (type is wchar_t) and
    > the end of a string to compare. Now I've got around 500 words to
    > compare to and if matched, I want to get an unique integer for the
    > word. Currently this is done by filling up a std::map<wstring, int>
    > with all words and doing an: iterator it = my_map.find(wchar_t* value)
    > and return it->second if found, otherwise return 0. Now this routine
    > takes pretty much time and I was wondering whether doing a manual list
    > of
    >
    > if (strncmp(...) == 0)
    > ..
    > else if (strncmp(...) == 0)
    > ..
    >
    > would be faster / more efficient? I was also thinking about doing
    > something like
    >
    > switch (*str)
    > {
    > case 'a':
    > if (strncmp(...))
    > }
    >
    > i.e. first compaing the first char and then do some deeper evaluation
    > to avoid calling the strncmp subroutine but I doubt that this even
    > makes sense.
    >
    > Any idea so far or any closer description required? Basically as said,
    > I am having a wchar_t* str_start and an wchar_t* str_end which I want
    > to compare to predefined words and match an unique integer code for
    > it.
    >


    This looks like a clear case of Hoare's Dictum. Have you even
    determined if this is your bottleneck? Have you looked into alternate
    datastructures and algorithms? Is there a reason your're using strncmp
    instead of wcsncmp (since you're dealing with wchar_t)? Why don't you
    convert your wchar_t array to a wstring?

    There's lots of stuff to be done up front before you worry about this
    sort of micro-optimization.
    red floyd, Nov 20, 2007
    #2
    1. Advertising

  3. Alexander Adam

    Andre Kostur Guest

    Alexander Adam <> wrote in news:e552122e-6fbc-439e-ba09-
    :

    > Hi,
    >
    > What I am having is this -- I got a string start (type is wchar_t) and
    > the end of a string to compare. Now I've got around 500 words to
    > compare to and if matched, I want to get an unique integer for the
    > word. Currently this is done by filling up a std::map<wstring, int>
    > with all words and doing an: iterator it = my_map.find(wchar_t* value)
    > and return it->second if found, otherwise return 0. Now this routine
    > takes pretty much time and I was wondering whether doing a manual list
    > of
    >
    > if (strncmp(...) == 0)
    > ..
    > else if (strncmp(...) == 0)
    > ..
    >
    > would be faster / more efficient? I was also thinking about doing


    As Red Floyd has mentioned... profile first. However, how would you expect
    a linear search to be able to beat the tree search (except for certain
    degenerate cases)? Unless you have some statistical knowledge of how often
    the words occur, the map will use on average 9 comparisons where your
    linear search will use 250 (If you happen to know that 1 particular word
    occurs 99.9% of the time, the linear search would kick the tree's butt if
    you look for that word first. Then again, you could look for that word
    first, then search the map if it doesn't match.). I would think that this
    change would be going backwards in terms of performance. (But of course
    you'd have to measure it to be sure). Perhaps a hashing container would
    net you some benefits (see std::tr1::unordered_map<>). Depends on how
    expensive the hashing function is I suppose.
    Andre Kostur, Nov 20, 2007
    #3
  4. Alexander Adam wrote:

    > Hi,
    >
    > What I am having is this -- I got a string start (type is wchar_t) and
    > the end of a string to compare. Now I've got around 500 words to
    > compare to and if matched, I want to get an unique integer for the
    > word. Currently this is done by filling up a std::map<wstring, int>
    > with all words and doing an: iterator it = my_map.find(wchar_t* value)
    > and return it->second if found, otherwise return 0. Now this routine
    > takes pretty much time and I was wondering whether doing a manual list
    > of
    >
    > if (strncmp(...) == 0)
    > ..
    > else if (strncmp(...) == 0)
    > ..


    No, because on average you have to make 250 compares per word, while
    std::map does 8 or 9. If some of your search words show up more often
    than others then you might sort the compares, but that's not really
    helping you.

    >
    > would be faster / more efficient? I was also thinking about doing
    > something like
    >
    > switch (*str)
    > {
    > case 'a':
    > if (strncmp(...))
    > }


    Yes that's better. I don't want to bother you with further hints on how
    to improve this further, or if you should use hash_maps instead. If
    (what I guess from your suggestion to strcmp everything) you have a
    *fixed* number of words that you would like to recognize, then a very
    good bet is a code generator which turns your wordlist into a piece of
    code to recognize these words as fast a possible.
    I'm quite happy with re2c (http://re2c.org/), which, with its default
    settings, basically does the switch-thing, alas to the bitter end. And
    it supports wide-characters.

    --
    IYesNo yes=YesNoFactory.getFactoryInstance().YES;
    yes.getDescription().equals(array[0].toUpperCase());
    Marco Manfredini, Nov 20, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gnanaprakash Rathinam

    CreateInstranceAndUnwrap slow perfomance

    Gnanaprakash Rathinam, Dec 29, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    2,109
    David Levine
    Dec 30, 2004
  2. Fredrik Melin

    Server perfomance

    Fredrik Melin, Oct 27, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    409
    Paul Glavich [MVP - ASP.NET]
    Oct 27, 2004
  3. Scott Reynolds

    data perfomance?

    Scott Reynolds, Mar 1, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    417
    Karl Seguin
    Mar 1, 2005
  4. George

    Huge HTML output perfomance

    George, Mar 28, 2005, in forum: ASP .Net
    Replies:
    10
    Views:
    667
    Robbe Morris [C# MVP]
    Mar 29, 2005
  5. James T.

    Perfomance test

    James T., Feb 26, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    404
    James T.
    Feb 26, 2006
Loading...

Share This Page