Compare without regard to case

JKop · Sep 15, 2004

Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

Thanks,

-JKop

Gernot Frisch · Sep 15, 2004

JKop said:
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

stricmp(str1, str2);

Sharad Kala · Sep 15, 2004

JKop said:
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

One way is to inherit from char_traits<...> and provide the necessary
overrides. Then use that class instead of std::string. This is discussed in
this GotW series - http://www.gotw.ca/gotw/029.htm

Sharad

John Harrison · Sep 15, 2004

Gernot Frisch said:
stricmp(str1, str2);

stricmp is not standard C or C++.

john

Sharad Kala · Sep 15, 2004

Gernot Frisch said:
stricmp(str1, str2);

Isn't that non-standard ?

John Harrison · Sep 15, 2004

JKop said:
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

Thanks,

-JKop

There is no standard C++ function for doing that. You could write something
yourself using the toupper or tolower functions which operate on individual
characters.

john

Gernot Frisch · Sep 15, 2004

stricmp is not standard C or C++.

Uh!? But strcmp is?

John Harrison · Sep 15, 2004

Gernot Frisch said:
Uh!? But strcmp is?

Right.

john

JKop · Sep 15, 2004

John Harrison posted:

There is no standard C++ function for doing that. You could write
something yourself using the toupper or tolower functions which operate
on individual characters.

john

Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function, could they please
copy-paste it here, or perhaps post the code for that
"stricmp" function.

Thanks,

-JKop

Tim Love · Sep 15, 2004

JKop said:
Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function,

Functions like this have been posted here in the past, based around
lines like
transform(text.begin(),text.end(),text.begin(),toupper);

Peter Koch Larsen · Sep 15, 2004

JKop said:
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

Thanks,

-JKop

This is not so simple as it sounds - and this in one of the reasons, there
is no "standard" C++ function of that type.
One difficulty is related to the fact that different countries have
different rules for collation. Sometimes the rules even differ corresponding
to context (is it a dictionary or a telephone book) and sometimes the rules
even differ according to the meaning of the word.
But for an explanation of this, do go to comp.lang.cpp.moderated and search
for recent discussions there (i believe it started in august and lasted
about a month).

Kind regards
Peter

David Fisher · Sep 16, 2004

JKop said:
Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function, could they please
copy-paste it here, or perhaps post the code for that
"stricmp" function.

#include <cctype> // for tolower()
#include <cassert>

// returns < 0 if s1 < s2, > 0 if s1 > s2 or 0 if the strings
// are equal (without regard to case)
// ie. behaves like strcmp()

int stricmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}
}

return (*s1 ? 1 : (*s2 ? -1 : 0));
}

int stricmp(std::string s1, std::string s2)
{
return stricmp(s1.c_str(), s2.c_str());
}

void test_stricmp()
{
assert(stricmp("abc", "abc") == 0);
assert(stricmp("abc", "ABC") == 0);
assert(stricmp("abc", "DEF") < 0);
assert(stricmp("ABC", "def") < 0);
assert(stricmp("DEF", "abc") > 0);
assert(stricmp("def", "ABC") > 0);
assert(stricmp("abc", "abca") < 0);
assert(stricmp("abca", "abc") > 0);
assert(stricmp("", "") == 0);
assert(stricmp("", "a") < 0);
assert(stricmp("a", "") > 0);
}

David Fisher
Sydney, Australia

Julie · Sep 16, 2004

JKop said:
Haven't been able to find such a thing.

Can anyone please inform me of a Standard C++ function for
comparing two strings without regard to case. Both for
working with "char*", and with "std::string".

Do you want to compare (for collation), or to strictly test for equality?

John Harrison · Sep 16, 2004

David Fisher said:
#include <cctype> // for tolower()
#include <cassert>

// returns < 0 if s1 < s2, > 0 if s1 > s2 or 0 if the strings
// are equal (without regard to case)
// ie. behaves like strcmp()

int stricmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}
}

return (*s1 ? 1 : (*s2 ? -1 : 0));
}

int stricmp(std::string s1, std::string s2)
{
return stricmp(s1.c_str(), s2.c_str());
}

void test_stricmp()
{
assert(stricmp("abc", "abc") == 0);
assert(stricmp("abc", "ABC") == 0);
assert(stricmp("abc", "DEF") < 0);
assert(stricmp("ABC", "def") < 0);
assert(stricmp("DEF", "abc") > 0);
assert(stricmp("def", "ABC") > 0);
assert(stricmp("abc", "abca") < 0);
assert(stricmp("abca", "abc") > 0);
assert(stricmp("", "") == 0);
assert(stricmp("", "a") < 0);
assert(stricmp("a", "") > 0);
}

David Fisher
Sydney, Australia

It's an error to pass a char to tolower. The valid inputs for tolower are
integers in the range 0 to UCHAR_MAX and EOF. Because chars maybe signed
then passing a char to tolower may result in a negative value being passed
and that has undefined behaviour. Unsigned char is not a problem.

For the same reason

transform(text.begin(),text.end(),text.begin(),toupper);

suggested by Tim Love is also invalid.

Also this statement

if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}

is bugged because s1 and s2 will be incremented in the if statement before
the subtraction is done.

So I'd suggest

int stricmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
unsigned char ch1 = *s1;
unsigned char ch2 = *s2;
if (tolower(ch1) != tolower(ch2))
{
return (int)ch1 - (int)ch2;
}
++s1;
++s2;
}

return (*s1 ? 1 : (*s2 ? -1 : 0));
}

but I haven't tested it.

John

David Fisher · Sep 16, 2004

John said:
It's an error to pass a char to tolower. The valid inputs for tolower are
integers in the range 0 to UCHAR_MAX and EOF. Because chars maybe signed
then passing a char to tolower may result in a negative value being passed
and that has undefined behaviour. Unsigned char is not a problem.

I see your point, but it's very surprising ... most people would expect
something like tolower('A') to return 'a'. I guess it's only a problem
for character sets with upper case characters >= 128 decimal (in ASCII,
upper case letters are from 65 to 90). Are there any character sets
like this you are aware of ? (I don't know EBCDIC).

BTW the UNIX manual entry on my machine says that for any values other
than upper case letters, the argument value is returned unchanged (rather
than being undefined behaviour).

Also this statement

if (tolower(*s1++) != tolower(*s2++))
{
return (int) tolower(*s1) - (int) tolower(*s2);
}

is bugged because s1 and s2 will be incremented in the if statement before
the subtraction is done.

Oops .. of course it was a deliberate mistake ...

Thanks for the comments,

David Fisher
Sydney, Australia

Kai-Uwe Bux · Sep 16, 2004

JKop said:
John Harrison posted:

Okay not to be too "do stuff for me"ish, but if some-one
has already written such a function, could they please
copy-paste it here, or perhaps post the code for that
"stricmp" function.

Thanks,

-JKop

Ignoring case is relative to the locale you want to use. The following code
uses the global locale by default:

#include <locale>
#include <string>
#include <iostream>

bool
string_equal_to_ignoring_case ( std::string a,
std::string b,
std::locale loc = std::locale() ) {
if ( a.size() != b.size() ) {
return( false );
}
std::string::size_type length = a.size();
for ( std::string::size_type i = 0;
i < length;
++i ) {
if ( std::tolower( a, loc ) != std::tolower( b, loc ) ) {
return( false );
}
}
return( true );
}

int main ( void ) {
std::string a ( "Hello World!" );
std::string b ( "hello world!" );

std::cout << string_equal_to_ignoring_case( a, b ) << '\n';
}

Best

Kai-Uwe Bux

John Harrison · Sep 16, 2004

David Fisher said:
I see your point, but it's very surprising ... most people would expect
something like tolower('A') to return 'a'.

tolower((unsigned char)'A') will return 'a'.

I guess it's only a problem
for character sets with upper case characters >= 128 decimal (in ASCII,
upper case letters are from 65 to 90). Are there any character sets
like this you are aware of ? (I don't know EBCDIC).

Passing any negative value (other than EOF) to any of the character
classification routines (tolower, islower, isalpha etc) is undefined
behaviour. If you are sure that your 8 bit char string will only ever
contains character codes in the range 0 to 127 then there is no problem. But
you can't be sure of that in a library routine like stricmp.

BTW the UNIX manual entry on my machine says that for any values other
than upper case letters, the argument value is returned unchanged (rather
than being undefined behaviour).

C99 standard 7.4 para 1, 'In all cases [talking about <ctype.h>] the
argument is an int, the value of which shall be representable as an unsigned
char or shall equal the macro EOF'.

But passing a char to ctype.h routines is such a common practise that I
wouldn't be surprised if most compilers accepted negative values and defined
some reasonable behaviour for them.

john

JKop · Sep 16, 2004

Julie posted:

Do you want to compare (for collation), or to strictly test for equality?

Actually, it's for filenames.

kernel32.dll

and

Kernel32.DLL

and

KerNel32.DlL

are the same file!

-JKop

JKop · Sep 16, 2004

tolower((unsigned char)'A') will return 'a'.

(unsigned char)'A' disgusts me!

unsigned char('A')

Also, if you're going for ultimate efficency:

The inputed char:

char k = 'A';

unsigned char& uk = *reinterpret_cast<unsigned char*>(&k);

(I first thought of using a union but the above is better)

-JKop

Peter Koch Larsen · Sep 16, 2004

JKop said:
Julie posted:

Actually, it's for filenames.

kernel32.dll

and

Kernel32.DLL

and

KerNel32.DlL

are the same file!

-JKop

In that case you should compare the same way windows does. I do not know if
Windows compares according to the standard locale on the machine, but my
guess is that they would use some homegrown scheme, where e.g. the danish
letter "ø" compares equal to "Ø" but the german small double s (looks like
the greek beta) is not equal to "SS".

/Peter

Case insensitive compare	1	Apr 12, 2005
Portable way to compare strings using case-insensitive	0	Aug 22, 2006
How to print prefix and suffix without giving a String as an argument between them	2	May 9, 2022
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Case insensitive exists()?	9	Jan 23, 2014
can't stream cast from a case insensitive string	0	Oct 9, 2011
How to write a multi value compare function for std::map	1	Nov 10, 2008
STL map, compare function	5	Jun 23, 2008

Compare without regard to case

JKop

Gernot Frisch

Sharad Kala

John Harrison

Sharad Kala

John Harrison

Gernot Frisch

John Harrison

JKop

Tim Love

Peter Koch Larsen

David Fisher

Julie

John Harrison

David Fisher

Kai-Uwe Bux

John Harrison

JKop

JKop

Peter Koch Larsen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads