comparing two strcasecmp (stricmp) implementations

W

William Krick

I'm currently evaluating two implementations of a case insensitive
string comparison function to replace the non-ANSI stricmp(). Both of
the implementations below seem to work fine but I'm wondering if one is
better than the other or if there is some sort of hybrid of the two
that would be superior.


IMPLEMENTATION 1:

#ifndef HAVE_STRCASECMP
#define ccmp(a,b) ((a) == (b) ? 0 : ((a) > (b) ? 1 : -1))
int strcasecmp(unsigned char *s1, unsigned char *s2)
{
unsigned char c1, c2;
for ( ; ; )
{
if (*s1 == '\0' || *s2 == '\0')
return ccmp(*s1,*s2);
c1= (isascii(*s1) && isupper(*s1)) ? (unsigned char) tolower(*s1) :
*s1;
c2= (isascii(*s2) && isupper(*s2)) ? (unsigned char) tolower(*s2) :
*s2;
if (c1 != c2)
return ccmp(c1,c2);
s1++;
s2++;
}
}
#undef ccmp
#endif


IMPLEMENTATION 2:

int strcasecmp(const char *s1, const char *s2)
{
unsigned char c1,c2;
do {
c1 = *s1++;
c2 = *s2++;
c1 = (unsigned char) tolower( (unsigned char) c1);
c2 = (unsigned char) tolower( (unsigned char) c2);
}
while((c1 == c2) && (c1 != '\0'));
return (int) c1-c2;
}
 
C

Chris Dollin

William said:
I'm currently evaluating two implementations of a case insensitive
string comparison function to replace the non-ANSI stricmp(). Both of
the implementations below seem to work fine but I'm wondering if one is
better than the other or if there is some sort of hybrid of the two
that would be superior.


IMPLEMENTATION 1:

#ifndef HAVE_STRCASECMP
#define ccmp(a,b) ((a) == (b) ? 0 : ((a) > (b) ? 1 : -1))
int strcasecmp(unsigned char *s1, unsigned char *s2)
{
unsigned char c1, c2;
for ( ; ; )
{
if (*s1 == '\0' || *s2 == '\0')
return ccmp(*s1,*s2);
c1= (isascii(*s1) && isupper(*s1)) ? (unsigned char) tolower(*s1) :
*s1;
c2= (isascii(*s2) && isupper(*s2)) ? (unsigned char) tolower(*s2) :
*s2;
if (c1 != c2)
return ccmp(c1,c2);
s1++;
s2++;
}
}
#undef ccmp
#endif


IMPLEMENTATION 2:

int strcasecmp(const char *s1, const char *s2)
{
unsigned char c1,c2;
do {
c1 = *s1++;
c2 = *s2++;
c1 = (unsigned char) tolower( (unsigned char) c1);
c2 = (unsigned char) tolower( (unsigned char) c2);
}
while((c1 == c2) && (c1 != '\0'));
return (int) c1-c2;
}

How about:

int strcasecmp( const char *s1, const char *s2 )
{
while (1)
{
int c1 = tolower( (unsigned char) *s1++ );
int c2 = tolower( (unsigned char) *s2++ );
if (c1 == 0 || c1 != c2) return c1 - c2;
}
}

Doesn't reuse variables, doesn't have iffy casts, slightly shorter.
What have I missed?
 
W

William Krick

To make the code compatible with older C compilers, I had to move the
declaration of c1 & c2 up to the top. I also changed it from while(1)
to for(;;) because the while was throwing a warning. However, while
this code works great in most cases, it doesn't handle NULL strings and
just blows up.

int strcasecmp( const char *s1, const char *s2 )
{
int c1, c2;
for(;;)
{
c1 = tolower( (unsigned char) *s1++ );
c2 = tolower( (unsigned char) *s2++ );
if (c1 == 0 || c1 != c2)
return c1 - c2;
}
}
 
R

Richard Tobin

William Krick said:
To make the code compatible with older C compilers, I had to move the
declaration of c1 & c2 up to the top.

Really? What C compilers are these?
I also changed it from while(1)
to for(;;) because the while was throwing a warning.

Definitely time for a new compiler!
However, while
this code works great in most cases, it doesn't handle NULL strings and
just blows up.

Since when were the str* functions supposed to handle NULL?

-- Richard
 
C

Chris Dollin

William said:
To make the code compatible with older C compilers, I had to move the
declaration of c1 & c2 up to the top.

You're not serious, surely. You have C compilers that don't allow
declarations in nested blocks? This is not a recent feature.
I also changed it from while(1)
to for(;;) because the while was throwing a warning.

Well, OK I suppose; but I'd be deeply suspicious of such a warning
myself (perhaps I used while(1) more than usual).
However, while
this code works great in most cases, it doesn't handle NULL strings and
just blows up.

There are no such things as NULL strings. There are null (empty) strings,
and there are null pointers, which are not any kind of string. If you mean
that it doesn't handle null pointer arguments, well no, it doesn't; it
does case-insensitive string compare, and NULL isn't a string. The user
should not call it with null pointer arguments.

(I can't think of a sensible answer to return for any null argument. If
you really really want to guard for this case, just add to the top

if (s1 == 0 || s2 == 0) return WHATEVERYOUWANT;

or wrap it as

if (s1 && s2) THEPREVIOUSCODE
else return WHATEVERYOUWANT;
)
 
W

William Krick

Chris said:
There are no such things as NULL strings. There are null (empty) strings,
and there are null pointers, which are not any kind of string. If you mean
that it doesn't handle null pointer arguments, well no, it doesn't; it
does case-insensitive string compare, and NULL isn't a string. The user
should not call it with null pointer arguments.


Point taken. This is my first foray back into C after being a Java
programmer for 5 years. I admit I'm VERY rusty.

I think Richard Tobin was right when he said...
"Since when were the str* functions supposed to handle NULL?"

I shouldn't be trying to handle null pointers.

Thanks for your help everyone.
 
S

Skarmander

Chris said:
William Krick wrote:


There are no such things as NULL strings. There are null (empty) strings,
and there are null pointers, which are not any kind of string. If you mean
that it doesn't handle null pointer arguments, well no, it doesn't; it
does case-insensitive string compare, and NULL isn't a string. The user
should not call it with null pointer arguments.

(I can't think of a sensible answer to return for any null argument. If
you really really want to guard for this case, just add to the top

if (s1 == 0 || s2 == 0) return WHATEVERYOUWANT;
I'd prefer

assert(s1 && s2);

Or a (documented) redefinition of the semantics (e.g., treat 0 as "").

You could use WHATEVERYOUWANT if you document either it or the fact that
passing null pointers will yield an indeterminate value. Don't keep it
under the hood, in any case.

S.
 
A

Alan Balmer

Well, OK I suppose; but I'd be deeply suspicious of such a warning
myself (perhaps I used while(1) more than usual).

In my experience, this warning is quite common. The warning is that
the condition is always true. In more complex cases, it can be useful.

I seem to remember that we had a fairly lengthy thread about this not
long ago. Might have been a different NG.
 
P

pete

William said:
I'm currently evaluating two implementations of a case insensitive
string comparison function to replace the non-ANSI stricmp().
Both of
the implementations below seem to work fine
but I'm wondering if one is
better than the other or if there is some sort of hybrid of the two
that would be superior.

This is what I use:

int str_ccmp(const char *s1, const char *s2)
{
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;

while (toupper(*p1) == toupper(*p2)) {
if (*p1 == '\0') {
return 0;
}
++p1;
++p2;
}
return toupper(*p2) > toupper(*p1) ? -1 : 1;
}
 
E

Eric Sosman

pete wrote On 11/10/05 12:58,:
This is what I use:

int str_ccmp(const char *s1, const char *s2)
{
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;

These would be incorrect (or at the very least dubious)
on signed-magnitude or ones' complement machines. In the
Immortal Words (which someone reacently mis-attributed to
me; they're by Henry Spencer): "If you lie to the compiler,
it will have its revenge." The above are lies, so ...
 
B

Ben Pfaff

Eric Sosman said:
pete wrote On 11/10/05 12:58,:
int str_ccmp(const char *s1, const char *s2)
{
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;

These would be incorrect (or at the very least dubious)
on signed-magnitude or ones' complement machines. [...]

What problem do you have in mind here?
 
W

William Krick

William said:
To make the code compatible with older C compilers, I had to move the
declaration of c1 & c2 up to the top. I also changed it from while(1)
to for(;;) because the while was throwing a warning.

int strcasecmp( const char *s1, const char *s2 )
{
int c1, c2;
for(;;)
{
c1 = tolower( (unsigned char) *s1++ );
c2 = tolower( (unsigned char) *s2++ );
if (c1 == 0 || c1 != c2)
return c1 - c2;
}
}


One final revision. I've modified the return statement so that it
returns -1 / 0 / 1 to bring it in line with the behaviour of other
similar functions...

int str_ccmp( const char *s1, const char *s2 )
{
int c1, c2;
for(;;)
{
c1 = tolower( (unsigned char) *s1++ );
c2 = tolower( (unsigned char) *s2++ );
if (c1 == 0 || c1 != c2)
return c1 == c2 ? 0 : c1 > c2 ? 1 : -1;
}
}
 
B

Ben Pfaff

[case-insensitive strcmp-like function]
I've modified the return statement so that it
returns -1 / 0 / 1 to bring it in line with the behaviour of other
similar functions...

strcmp() isn't specified so strictly. You can't depend on it
returning exactly -1 or 1. Here's what the standard says:

3 The strcmp function returns an integer greater than, equal to,
or less than zero, accordingly as the string pointed to by
s1 is greater than, equal to, or less than the string
pointed to by s2.
 
S

Skarmander

William said:
One final revision. I've modified the return statement so that it
returns -1 / 0 / 1 to bring it in line with the behaviour of other
similar functions...

int str_ccmp( const char *s1, const char *s2 )
{
int c1, c2;
for(;;)
{
c1 = tolower( (unsigned char) *s1++ );
c2 = tolower( (unsigned char) *s2++ );
if (c1 == 0 || c1 != c2)
return c1 == c2 ? 0 : c1 > c2 ? 1 : -1;
}
}

Aww, you ruined it. What next? Make it a do-while with a neat return at
the end?

S.
 
E

Eric Sosman

Ben Pfaff wrote On 11/10/05 13:49,:
pete wrote On 11/10/05 12:58,:
int str_ccmp(const char *s1, const char *s2)
{
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;

These would be incorrect (or at the very least dubious)
on signed-magnitude or ones' complement machines. [...]


What problem do you have in mind here?

char c = -1;
unsigned char *puc = (unsigned char*)&c;
printf ("%d ?= %d\n", (unsigned char)c, *puc);

Expected output (assuming 8-bit characters):

255 ?= 255 (two's complement)
255 ?= 129 (signed magnitude)
255 ?= 254 (ones' complement)

For the <ctype.h> functions, the argument corresponding to
the `char' whose value is -1 is `(int)UCHAR_MAX', always,
or 255 in the three cases above. Conversion from signed to
unsigned can involve more than just reinterpreting the bits.
 
P

pete

Eric said:
Ben Pfaff wrote On 11/10/05 13:49,:
pete wrote On 11/10/05 12:58,:

int str_ccmp(const char *s1, const char *s2)
{
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;

These would be incorrect (or at the very least dubious)
on signed-magnitude or ones' complement machines. [...]

I disagree.
For the <ctype.h> functions, the argument corresponding to
the `char' whose value is -1 is `(int)UCHAR_MAX', always,
or 255 in the three cases above. Conversion from signed to
unsigned can involve more than just reinterpreting the bits.

The string functions are relevant here,
in particular, the comparison functions.

N869
7.21.4 Comparison functions
[#1] The sign of a nonzero value returned by the comparison
functions memcmp, strcmp, and strncmp is determined by the
sign of the difference between the values of the first pair
of characters (both interpreted as unsigned char) that
differ in the objects being compared.
 
E

Eric Sosman

pete wrote On 11/10/05 15:30,:
Eric said:
Ben Pfaff wrote On 11/10/05 13:49,:
pete wrote On 11/10/05 12:58,:


int str_ccmp(const char *s1, const char *s2)
{
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;

These would be incorrect (or at the very least dubious)
on signed-magnitude or ones' complement machines. [...]


I disagree.

For the <ctype.h> functions, the argument corresponding to
the `char' whose value is -1 is `(int)UCHAR_MAX', always,
or 255 in the three cases above. Conversion from signed to
unsigned can involve more than just reinterpreting the bits.


The string functions are relevant here,
in particular, the comparison functions.

N869
7.21.4 Comparison functions
[#1] The sign of a nonzero value returned by the comparison
functions memcmp, strcmp, and strncmp is determined by the
sign of the difference between the values of the first pair
of characters (both interpreted as unsigned char) that
differ in the objects being compared.

Well, that's why I said "or at the very least dubious."
The characters being plucked from the string are handed as
arguments to tolower(), and the only requirement I can find
on the <ctype.h> argument values is 7.4/1:

"[...] the value of which shall be representable as
an unsigned char or shall equal the value of the
macro EOF."

The Standard is not entirely clear about what should
be done with negative `char' values. We know they need to
be made non-negative to become `unsigned char' values, but
is this to be done by conversion (my assumption) or by
reinterpretation (yours)? Nothing I can find in the Standard
or in the Rationale seems to shed any light. Hence "dubious"
rather than an unqualified "incorrect."
 
P

Peter Nilsson

Eric said:
...
The Standard is not entirely clear about what should
be done with negative `char' values. We know they need to
be made non-negative to become `unsigned char' values, but
is this to be done by conversion (my assumption) or by
reinterpretation (yours)? Nothing I can find in the Standard
or in the Rationale seems to shed any light. Hence "dubious"
rather than an unqualified "incorrect."

Some previous queries by myself on the issue...

http://groups.google.com/group/comp...2c290cf1f7/346b9c5072670cfd?#346b9c5072670cfd

http://groups.google.com/group/comp...f4278ffe943/c49809d907620f55#c49809d907620f55
 
J

Jordan Abel

One final revision. I've modified the return statement so that it
returns -1 / 0 / 1 to bring it in line with the behaviour of other
similar functions...

Such as which ones?

strcmp("foo","bar") returns 4 on my system, and probably yours. Only a
positive value is required by the standard in that case.
 
P

pete

Eric said:
pete wrote On 11/10/05 15:30,:
The Standard is not entirely clear about what should
be done with negative `char' values. We know they need to
be made non-negative to become `unsigned char' values, but
is this to be done by conversion (my assumption) or by
reinterpretation (yours)?

I'm seeing "interpreted as unsigned char"
in the above quote from the standard.
*(unsigned char *)byte, is the what "interpreted as" means.
The standard isn't shy about using the word "converted".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top