Case insensitive set of strings

Discussion in 'C++' started by Adrian, Apr 17, 2007.

  1. Adrian

    Adrian Guest

    Hi,

    I want a const static std::set of strings which is case insensitive
    for the values.

    So I have the following which seems to work but something doesnt seem
    right about it. Is there a better way or any gotcha's from my code
    below.


    TIA

    Adrian

    #include <iostream>
    #include <functional>
    #include <algorithm>
    #include <set>
    #include <string>
    #include <iterator>

    class Test
    {
    public:
    void p()
    {
    std::copy(fields.begin(), fields.end(),
    std::eek:stream_iterator<std::string>(std::cout, ","));
    std::cout << std::endl;
    }
    private:
    struct nocase_cmp : public std::binary_function<const
    std::string &, const std::string &, bool>
    {
    struct nocase_char_cmp : public std::binary_function<char,
    char, bool>
    {
    bool operator()(char a, char b)
    {
    return std::toupper(a) < std::toupper(b);
    }
    };
    bool operator()(const std::string &a, const std::string &b)
    {
    return std::lexicographical_compare(a.begin(), a.end(),
    b.begin(), b.end(),
    nocase_char_cmp());
    }
    };

    typedef std::set<std::string, nocase_cmp> Field_names_t;
    static const Field_names_t fields;
    };
    const char *f[]={
    "string1",
    "string2",
    "string3",
    "STRIng1",
    "string5"};

    const Test::Field_names_t Test::fields(f, f+5);

    int main(int argc, char *argv[])
    {
    Test t;
    t.p();

    return 0;
    }
     
    Adrian, Apr 17, 2007
    #1
    1. Advertising

  2. Adrian

    Mark P Guest

    Adrian wrote:
    > Hi,
    >
    > I want a const static std::set of strings which is case insensitive
    > for the values.
    >
    > So I have the following which seems to work but something doesnt seem
    > right about it. Is there a better way or any gotcha's from my code
    > below.
    >


    I don't see anything obviously wrong with your code. However if your
    set is _constant_ then std::set may be overkill (and incur needless time
    and space penalties). A possibly more efficient approach would be to
    use a sorted vector and the various binary search functions of the
    standard library (lower_bound, upper_bound, binary_search, etc.).

    Mark

    >
    > TIA
    >
    > Adrian
    >
    > #include <iostream>
    > #include <functional>
    > #include <algorithm>
    > #include <set>
    > #include <string>
    > #include <iterator>
    >
    > class Test
    > {
    > public:
    > void p()
    > {
    > std::copy(fields.begin(), fields.end(),
    > std::eek:stream_iterator<std::string>(std::cout, ","));
    > std::cout << std::endl;
    > }
    > private:
    > struct nocase_cmp : public std::binary_function<const
    > std::string &, const std::string &, bool>
    > {
    > struct nocase_char_cmp : public std::binary_function<char,
    > char, bool>
    > {
    > bool operator()(char a, char b)
    > {
    > return std::toupper(a) < std::toupper(b);
    > }
    > };
    > bool operator()(const std::string &a, const std::string &b)
    > {
    > return std::lexicographical_compare(a.begin(), a.end(),
    > b.begin(), b.end(),
    > nocase_char_cmp());
    > }
    > };
    >
    > typedef std::set<std::string, nocase_cmp> Field_names_t;
    > static const Field_names_t fields;
    > };
    > const char *f[]={
    > "string1",
    > "string2",
    > "string3",
    > "STRIng1",
    > "string5"};
    >
    > const Test::Field_names_t Test::fields(f, f+5);
    >
    > int main(int argc, char *argv[])
    > {
    > Test t;
    > t.p();
    >
    > return 0;
    > }
    >
     
    Mark P, Apr 17, 2007
    #2
    1. Advertising

  3. Adrian

    James Kanze Guest

    On Apr 17, 9:30 pm, Adrian <> wrote:

    > I want a const static std::set of strings which is case insensitive
    > for the values.


    > So I have the following which seems to work but something doesnt seem
    > right about it. Is there a better way or any gotcha's from my code
    > below.


    Your code has undefined behavior.

    > #include <iostream>
    > #include <functional>
    > #include <algorithm>
    > #include <set>
    > #include <string>
    > #include <iterator>


    Don't forget:
    #include <cctype>
    (or <locale>, if you use the toupper functions from there).

    > class Test
    > {
    > public:
    > void p()
    > {
    > std::copy(fields.begin(), fields.end(),
    > std::eek:stream_iterator<std::string>(std::cout, ","));
    > std::cout << std::endl;
    > }
    > private:
    > struct nocase_cmp : public std::binary_function<const
    > std::string &, const std::string &, bool>
    > {
    > struct nocase_char_cmp : public std::binary_function<char,
    > char, bool>
    > {
    > bool operator()(char a, char b)


    The function should be const, I think.

    > {
    > return std::toupper(a) < std::toupper(b);


    Calling the single argument form of toupper with a char as
    argument is undefined behavior. The argument type is int, with
    the constraint that the value of the int must be either EOF, or
    in the range [0...UCHAR_MAX]. If char is signed, it won't be in
    range when converted (implicitly) to int.

    There are two solutions here: either explicitly convert the char
    to unsigned char before calling toupper, e.g.:

    return toupper( static_cast< unsigned char >( a ) )
    < toupper( static_cast< unsigned char >( b ) ) ;

    or use the two operator forms in std::ctype. (In that case, I
    would use something like:

    class nocase_char_cmp
    {
    public:
    typedef std::ctype< char >
    ctype ;
    explicit nocase_char_cmp(
    std::locale const& l = std::locale() )
    : my_ctype( &std::use_facet< ctype >( l ) )
    {
    }

    bool operator()( char a, char b ) const
    {
    return my_ctype->tolower( a ) < my_ctype->toupper( a ) ;
    }

    private:
    ctype const* my_ctype ;
    } ;

    ..)

    If you have a lot of case insensitive comparisons, it might be
    worth writing a case insensitive collate facet (or there might
    even be one available ready-made); in that case, just pass an
    std::locale with this facet as the fifth argument to
    lexicographical_compare, and you're done with it.

    > }
    > };
    > bool operator()(const std::string &a, const std::string &b)
    > {
    > return std::lexicographical_compare(a.begin(), a.end(),
    > b.begin(), b.end(),
    > nocase_char_cmp());
    > }
    > };


    > typedef std::set<std::string, nocase_cmp> Field_names_t;
    > static const Field_names_t fields;};


    > const char *f[]={
    > "string1",
    > "string2",
    > "string3",
    > "STRIng1",
    > "string5"};


    Try throwing in some characters whose encoding results in a
    negative number, and see what happens. (On my machine, just
    about any accented character will do the trick. In my test
    suites, I'll generally make sure that there is a ÿ somewhere,
    since in the most frequent encoding, it is 0xFF, which, when
    stored into a char, becomes -1, or EOF. You'd be surprised how
    many programs stop when they encounter this character in a
    file.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Apr 18, 2007
    #3
  4. Adrian

    James Kanze Guest

    On Apr 17, 11:42 pm, Mark P <>
    wrote:
    > Adrian wrote:


    > > I want a const static std::set of strings which is case insensitive
    > > for the values.


    > > So I have the following which seems to work but something doesnt seem
    > > right about it. Is there a better way or any gotcha's from my code
    > > below.


    > I don't see anything obviously wrong with your code. However if your
    > set is _constant_ then std::set may be overkill (and incur needless time
    > and space penalties). A possibly more efficient approach would be to
    > use a sorted vector and the various binary search functions of the
    > standard library (lower_bound, upper_bound, binary_search, etc.).


    In that case, you probably want char const*[], and use
    lower_bound on it. (Sort the original data in the editor, e.g.
    mark the block, and pipe it through the system utility sort with
    the correct options for case insensistivity.) That way, you get
    static initialization and thus avoid any order of initialization
    problems. (Depending on the use, it might even be simpler to
    not bother sorting it, and use std::find. Unless the actual
    table has hundreds of entries, you're probably not likely to
    notice the difference.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Apr 18, 2007
    #4
  5. Adrian

    Adrian Guest

    On Apr 18, 2:09 am, James Kanze <> wrote:
    > On Apr 17, 9:30 pm, Adrian <> wrote:
    >
    > > I want a const static std::set of strings which is case insensitive
    > > for the values.
    > > So I have the following which seems to work but something doesnt seem
    > > right about it. Is there a better way or any gotcha's from my code
    > > below.

    >
    > Your code has undefined behavior.

    Thanks James,

    I thought the implicit would work without thinking much - a little
    test proves it wont :)

    Thanks for the facet stuff - will have to read more as I've not used
    them before.

    Adrian
     
    Adrian, Apr 18, 2007
    #5
  6. Adrian

    Adrian Guest

    On Apr 18, 2:14 am, James Kanze <> wrote:
    > On Apr 17, 11:42 pm, Mark P <>
    > wrote:
    >
    > > Adrian wrote:
    > > > I want a const static std::set of strings which is case insensitive
    > > > for the values.
    > > > So I have the following which seems to work but something doesnt seem
    > > > right about it. Is there a better way or any gotcha's from my code
    > > > below.

    > > I don't see anything obviously wrong with your code. However if your
    > > set is _constant_ then std::set may be overkill (and incur needless time
    > > and space penalties). A possibly more efficient approach would be to
    > > use a sorted vector and the various binary search functions of the
    > > standard library (lower_bound, upper_bound, binary_search, etc.).

    >
    > In that case, you probably want char const*[], and use
    > lower_bound on it. (Sort the original data in the editor, e.g.
    > mark the block, and pipe it through the system utility sort with
    > the correct options for case insensistivity.) That way, you get
    > static initialization and thus avoid any order of initialization
    > problems. (Depending on the use, it might even be simpler to
    > not bother sorting it, and use std::find. Unless the actual
    > table has hundreds of entries, you're probably not likely to
    > notice the difference.)


    The strings in the set I have control over - its incomming data that
    will be case insensitive that I need to match against the set.

    I started with const char *[] but the set removes issues of including
    duplicates.

    Adrian
     
    Adrian, Apr 18, 2007
    #6
  7. Adrian

    Mark P Guest

    Adrian wrote:
    > On Apr 18, 2:14 am, James Kanze <> wrote:
    >> On Apr 17, 11:42 pm, Mark P <>
    >> wrote:
    >>
    >>> Adrian wrote:
    >>>> I want a const static std::set of strings which is case insensitive
    >>>> for the values.
    >>>> So I have the following which seems to work but something doesnt seem
    >>>> right about it. Is there a better way or any gotcha's from my code
    >>>> below.
    >>> I don't see anything obviously wrong with your code. However if your
    >>> set is _constant_ then std::set may be overkill (and incur needless time
    >>> and space penalties). A possibly more efficient approach would be to
    >>> use a sorted vector and the various binary search functions of the
    >>> standard library (lower_bound, upper_bound, binary_search, etc.).

    >> In that case, you probably want char const*[], and use
    >> lower_bound on it. (Sort the original data in the editor, e.g.
    >> mark the block, and pipe it through the system utility sort with
    >> the correct options for case insensistivity.) That way, you get
    >> static initialization and thus avoid any order of initialization
    >> problems. (Depending on the use, it might even be simpler to
    >> not bother sorting it, and use std::find. Unless the actual
    >> table has hundreds of entries, you're probably not likely to
    >> notice the difference.)

    >
    > The strings in the set I have control over - its incomming data that
    > will be case insensitive that I need to match against the set.
    >
    > I started with const char *[] but the set removes issues of including
    > duplicates.
    >


    Look at std::unique if that's your concern. The comparative advantage
    of a set is that it handles dynamic data efficiently.

    Mark
     
    Mark P, Apr 18, 2007
    #7
  8. Adrian

    James Kanze Guest

    On Apr 18, 3:32 pm, Adrian <> wrote:
    > On Apr 18, 2:14 am, James Kanze <> wrote:
    > > On Apr 17, 11:42 pm, Mark P <>
    > > wrote:


    > > > Adrian wrote:
    > > > > I want a const static std::set of strings which is case insensitive
    > > > > for the values.
    > > > > So I have the following which seems to work but something doesnt seem
    > > > > right about it. Is there a better way or any gotcha's from my code
    > > > > below.
    > > > I don't see anything obviously wrong with your code. However if your
    > > > set is _constant_ then std::set may be overkill (and incur needless time
    > > > and space penalties). A possibly more efficient approach would be to
    > > > use a sorted vector and the various binary search functions of the
    > > > standard library (lower_bound, upper_bound, binary_search, etc.).


    > > In that case, you probably want char const*[], and use
    > > lower_bound on it. (Sort the original data in the editor, e.g.
    > > mark the block, and pipe it through the system utility sort with
    > > the correct options for case insensistivity.) That way, you get
    > > static initialization and thus avoid any order of initialization
    > > problems. (Depending on the use, it might even be simpler to
    > > not bother sorting it, and use std::find. Unless the actual
    > > table has hundreds of entries, you're probably not likely to
    > > notice the difference.)


    > The strings in the set I have control over - its incomming data that
    > will be case insensitive that I need to match against the set.


    In other words, you have to determine whether the incoming data
    isin the set or not.

    > I started with const char *[] but the set removes issues of including
    > duplicates.


    If the set is const, presumably you haven't put any duplicates
    in there to begin with. I'll often do things like:

    char const* table[] =
    {
    "string A",
    "string B",
    // ...
    } ;

    then filter the lines between (but not including) the { and the
    } through "sort"; adding a -u option to sort would eliminate
    duplicates, and putting a 'tr '[:upper:]' '[:lower:]' at the
    head of the pipe will ensure that everything is lower case.
    Then, use std::find or std::lowerbound, accordingly. (std::find
    is easier, and probably sufficient if the set only has a couple
    of dozen elements.)

    Although likely not an issue in your code, the advantage of
    doing this is that all data initialization is static, and you
    avoid order of initialization issues.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Apr 19, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tee
    Replies:
    3
    Views:
    7,904
    Herfried K. Wagner [MVP]
    Jun 23, 2004
  2. Diego Martins
    Replies:
    0
    Views:
    352
    Diego Martins
    Aug 22, 2006
  3. Replies:
    1
    Views:
    2,547
    Mark P
    Apr 6, 2007
  4. J055
    Replies:
    2
    Views:
    3,313
  5. Xah Lee
    Replies:
    4
    Views:
    1,042
Loading...

Share This Page