Compare without regard to case

J

JKop

Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".


Thanks,

-JKop
 
G

Gernot Frisch

JKop said:
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

stricmp(str1, str2);
 
S

Sharad Kala

JKop said:
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

One way is to inherit from char_traits<...> and provide the necessary
overrides. Then use that class instead of std::string. This is discussed in
this GotW series - http://www.gotw.ca/gotw/029.htm


Sharad
 
J

John Harrison

JKop said:
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".


Thanks,

-JKop

There is no standard C++ function for doing that. You could write something
yourself using the toupper or tolower functions which operate on individual
characters.

john
 
J

JKop

John Harrison posted:
There is no standard C++ function for doing that. You could write
something yourself using the toupper or tolower functions which operate
on individual characters.

john

Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function, could they please
copy-paste it here, or perhaps post the code for that
"stricmp" function.

Thanks,

-JKop
 
T

Tim Love

JKop said:
Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function,
Functions like this have been posted here in the past, based around
lines like
transform(text.begin(),text.end(),text.begin(),toupper);
 
P

Peter Koch Larsen

JKop said:
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".


Thanks,

-JKop

This is not so simple as it sounds - and this in one of the reasons, there
is no "standard" C++ function of that type.
One difficulty is related to the fact that different countries have
different rules for collation. Sometimes the rules even differ corresponding
to context (is it a dictionary or a telephone book) and sometimes the rules
even differ according to the meaning of the word.
But for an explanation of this, do go to comp.lang.cpp.moderated and search
for recent discussions there (i believe it started in august and lasted
about a month).

Kind regards
Peter
 
D

David Fisher

JKop said:
Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function, could they please
copy-paste it here, or perhaps post the code for that
"stricmp" function.

#include <cctype> // for tolower()
#include <cassert>

// returns < 0 if s1 < s2, > 0 if s1 > s2 or 0 if the strings
// are equal (without regard to case)
// ie. behaves like strcmp()

int stricmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}
}

return (*s1 ? 1 : (*s2 ? -1 : 0));
}

int stricmp(std::string s1, std::string s2)
{
return stricmp(s1.c_str(), s2.c_str());
}

void test_stricmp()
{
assert(stricmp("abc", "abc") == 0);
assert(stricmp("abc", "ABC") == 0);
assert(stricmp("abc", "DEF") < 0);
assert(stricmp("ABC", "def") < 0);
assert(stricmp("DEF", "abc") > 0);
assert(stricmp("def", "ABC") > 0);
assert(stricmp("abc", "abca") < 0);
assert(stricmp("abca", "abc") > 0);
assert(stricmp("", "") == 0);
assert(stricmp("", "a") < 0);
assert(stricmp("a", "") > 0);
}

David Fisher
Sydney, Australia
 
J

Julie

JKop said:
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

Do you want to compare (for collation), or to strictly test for equality?
 
J

John Harrison

David Fisher said:
#include <cctype> // for tolower()
#include <cassert>

// returns < 0 if s1 < s2, > 0 if s1 > s2 or 0 if the strings
// are equal (without regard to case)
// ie. behaves like strcmp()

int stricmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}
}

return (*s1 ? 1 : (*s2 ? -1 : 0));
}

int stricmp(std::string s1, std::string s2)
{
return stricmp(s1.c_str(), s2.c_str());
}

void test_stricmp()
{
assert(stricmp("abc", "abc") == 0);
assert(stricmp("abc", "ABC") == 0);
assert(stricmp("abc", "DEF") < 0);
assert(stricmp("ABC", "def") < 0);
assert(stricmp("DEF", "abc") > 0);
assert(stricmp("def", "ABC") > 0);
assert(stricmp("abc", "abca") < 0);
assert(stricmp("abca", "abc") > 0);
assert(stricmp("", "") == 0);
assert(stricmp("", "a") < 0);
assert(stricmp("a", "") > 0);
}

David Fisher
Sydney, Australia

It's an error to pass a char to tolower. The valid inputs for tolower are
integers in the range 0 to UCHAR_MAX and EOF. Because chars maybe signed
then passing a char to tolower may result in a negative value being passed
and that has undefined behaviour. Unsigned char is not a problem.

For the same reason

transform(text.begin(),text.end(),text.begin(),toupper);

suggested by Tim Love is also invalid.

Also this statement

if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}

is bugged because s1 and s2 will be incremented in the if statement before
the subtraction is done.

So I'd suggest

int stricmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
unsigned char ch1 = *s1;
unsigned char ch2 = *s2;
if (tolower(ch1) != tolower(ch2))
{
return (int)ch1 - (int)ch2;
}
++s1;
++s2;
}

return (*s1 ? 1 : (*s2 ? -1 : 0));
}

but I haven't tested it.

John
 
D

David Fisher

John said:
It's an error to pass a char to tolower. The valid inputs for tolower are
integers in the range 0 to UCHAR_MAX and EOF. Because chars maybe signed
then passing a char to tolower may result in a negative value being passed
and that has undefined behaviour. Unsigned char is not a problem.

I see your point, but it's very surprising ... most people would expect
something like tolower('A') to return 'a'. I guess it's only a problem
for character sets with upper case characters >= 128 decimal (in ASCII,
upper case letters are from 65 to 90). Are there any character sets
like this you are aware of ? (I don't know EBCDIC).

BTW the UNIX manual entry on my machine says that for any values other
than upper case letters, the argument value is returned unchanged (rather
than being undefined behaviour).
Also this statement

if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}

is bugged because s1 and s2 will be incremented in the if statement before
the subtraction is done.

Oops .. of course it was a deliberate mistake ... :p

Thanks for the comments,

David Fisher
Sydney, Australia
 
K

Kai-Uwe Bux

JKop said:
John Harrison posted:


Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function, could they please
copy-paste it here, or perhaps post the code for that
"stricmp" function.

Thanks,

-JKop

Ignoring case is relative to the locale you want to use. The following code
uses the global locale by default:


#include <locale>
#include <string>
#include <iostream>

bool
string_equal_to_ignoring_case ( std::string a,
std::string b,
std::locale loc = std::locale() ) {
if ( a.size() != b.size() ) {
return( false );
}
std::string::size_type length = a.size();
for ( std::string::size_type i = 0;
i < length;
++i ) {
if ( std::tolower( a, loc ) != std::tolower( b, loc ) ) {
return( false );
}
}
return( true );
}


int main ( void ) {
std::string a ( "Hello World!" );
std::string b ( "hello world!" );

std::cout << string_equal_to_ignoring_case( a, b ) << '\n';
}


Best

Kai-Uwe Bux
 
J

John Harrison

David Fisher said:
I see your point, but it's very surprising ... most people would expect
something like tolower('A') to return 'a'.

tolower((unsigned char)'A') will return 'a'.
I guess it's only a problem
for character sets with upper case characters >= 128 decimal (in ASCII,
upper case letters are from 65 to 90). Are there any character sets
like this you are aware of ? (I don't know EBCDIC).

Passing any negative value (other than EOF) to any of the character
classification routines (tolower, islower, isalpha etc) is undefined
behaviour. If you are sure that your 8 bit char string will only ever
contains character codes in the range 0 to 127 then there is no problem. But
you can't be sure of that in a library routine like stricmp.
BTW the UNIX manual entry on my machine says that for any values other
than upper case letters, the argument value is returned unchanged (rather
than being undefined behaviour).

C99 standard 7.4 para 1, 'In all cases [talking about <ctype.h>] the
argument is an int, the value of which shall be representable as an unsigned
char or shall equal the macro EOF'.

But passing a char to ctype.h routines is such a common practise that I
wouldn't be surprised if most compilers accepted negative values and defined
some reasonable behaviour for them.

john
 
J

JKop

Julie posted:
Do you want to compare (for collation), or to strictly test for equality?

Actually, it's for filenames.

kernel32.dll

and

Kernel32.DLL

and

KerNel32.DlL

are the same file!

-JKop
 
J

JKop

tolower((unsigned char)'A') will return 'a'.


(unsigned char)'A' disgusts me!


unsigned char('A')


Also, if you're going for ultimate efficency:

The inputed char:

char k = 'A';

unsigned char& uk = *reinterpret_cast<unsigned char*>(&k);

(I first thought of using a union but the above is better)


-JKop
 
P

Peter Koch Larsen

JKop said:
Julie posted:


Actually, it's for filenames.

kernel32.dll

and

Kernel32.DLL

and

KerNel32.DlL

are the same file!

-JKop

In that case you should compare the same way windows does. I do not know if
Windows compares according to the standard locale on the machine, but my
guess is that they would use some homegrown scheme, where e.g. the danish
letter "ø" compares equal to "Ø" but the german small double s (looks like
the greek beta) is not equal to "SS".

/Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top