comparing two strcasecmp (stricmp) implementations

Discussion in 'C Programming' started by William Krick, Nov 10, 2005.

  1. I'm currently evaluating two implementations of a case insensitive
    string comparison function to replace the non-ANSI stricmp(). Both of
    the implementations below seem to work fine but I'm wondering if one is
    better than the other or if there is some sort of hybrid of the two
    that would be superior.


    IMPLEMENTATION 1:

    #ifndef HAVE_STRCASECMP
    #define ccmp(a,b) ((a) == (b) ? 0 : ((a) > (b) ? 1 : -1))
    int strcasecmp(unsigned char *s1, unsigned char *s2)
    {
    unsigned char c1, c2;
    for ( ; ; )
    {
    if (*s1 == '\0' || *s2 == '\0')
    return ccmp(*s1,*s2);
    c1= (isascii(*s1) && isupper(*s1)) ? (unsigned char) tolower(*s1) :
    *s1;
    c2= (isascii(*s2) && isupper(*s2)) ? (unsigned char) tolower(*s2) :
    *s2;
    if (c1 != c2)
    return ccmp(c1,c2);
    s1++;
    s2++;
    }
    }
    #undef ccmp
    #endif


    IMPLEMENTATION 2:

    int strcasecmp(const char *s1, const char *s2)
    {
    unsigned char c1,c2;
    do {
    c1 = *s1++;
    c2 = *s2++;
    c1 = (unsigned char) tolower( (unsigned char) c1);
    c2 = (unsigned char) tolower( (unsigned char) c2);
    }
    while((c1 == c2) && (c1 != '\0'));
    return (int) c1-c2;
    }
    William Krick, Nov 10, 2005
    #1
    1. Advertising

  2. William  Krick

    Chris Dollin Guest

    William Krick wrote:

    > I'm currently evaluating two implementations of a case insensitive
    > string comparison function to replace the non-ANSI stricmp(). Both of
    > the implementations below seem to work fine but I'm wondering if one is
    > better than the other or if there is some sort of hybrid of the two
    > that would be superior.
    >
    >
    > IMPLEMENTATION 1:
    >
    > #ifndef HAVE_STRCASECMP
    > #define ccmp(a,b) ((a) == (b) ? 0 : ((a) > (b) ? 1 : -1))
    > int strcasecmp(unsigned char *s1, unsigned char *s2)
    > {
    > unsigned char c1, c2;
    > for ( ; ; )
    > {
    > if (*s1 == '\0' || *s2 == '\0')
    > return ccmp(*s1,*s2);
    > c1= (isascii(*s1) && isupper(*s1)) ? (unsigned char) tolower(*s1) :
    > *s1;
    > c2= (isascii(*s2) && isupper(*s2)) ? (unsigned char) tolower(*s2) :
    > *s2;
    > if (c1 != c2)
    > return ccmp(c1,c2);
    > s1++;
    > s2++;
    > }
    > }
    > #undef ccmp
    > #endif
    >
    >
    > IMPLEMENTATION 2:
    >
    > int strcasecmp(const char *s1, const char *s2)
    > {
    > unsigned char c1,c2;
    > do {
    > c1 = *s1++;
    > c2 = *s2++;
    > c1 = (unsigned char) tolower( (unsigned char) c1);
    > c2 = (unsigned char) tolower( (unsigned char) c2);
    > }
    > while((c1 == c2) && (c1 != '\0'));
    > return (int) c1-c2;
    > }


    How about:

    int strcasecmp( const char *s1, const char *s2 )
    {
    while (1)
    {
    int c1 = tolower( (unsigned char) *s1++ );
    int c2 = tolower( (unsigned char) *s2++ );
    if (c1 == 0 || c1 != c2) return c1 - c2;
    }
    }

    Doesn't reuse variables, doesn't have iffy casts, slightly shorter.
    What have I missed?

    --
    Chris "one-track" Dollin
    Capability does not imply necessity.
    Chris Dollin, Nov 10, 2005
    #2
    1. Advertising

  3. To make the code compatible with older C compilers, I had to move the
    declaration of c1 & c2 up to the top. I also changed it from while(1)
    to for(;;) because the while was throwing a warning. However, while
    this code works great in most cases, it doesn't handle NULL strings and
    just blows up.

    int strcasecmp( const char *s1, const char *s2 )
    {
    int c1, c2;
    for(;;)
    {
    c1 = tolower( (unsigned char) *s1++ );
    c2 = tolower( (unsigned char) *s2++ );
    if (c1 == 0 || c1 != c2)
    return c1 - c2;
    }
    }
    William Krick, Nov 10, 2005
    #3
  4. In article <>,
    William Krick <> wrote:

    >To make the code compatible with older C compilers, I had to move the
    >declaration of c1 & c2 up to the top.


    Really? What C compilers are these?

    >I also changed it from while(1)
    >to for(;;) because the while was throwing a warning.


    Definitely time for a new compiler!

    >However, while
    >this code works great in most cases, it doesn't handle NULL strings and
    >just blows up.


    Since when were the str* functions supposed to handle NULL?

    -- Richard
    Richard Tobin, Nov 10, 2005
    #4
  5. William  Krick

    Chris Dollin Guest

    William Krick wrote:

    > To make the code compatible with older C compilers, I had to move the
    > declaration of c1 & c2 up to the top.


    You're not serious, surely. You have C compilers that don't allow
    declarations in nested blocks? This is not a recent feature.

    > I also changed it from while(1)
    > to for(;;) because the while was throwing a warning.


    Well, OK I suppose; but I'd be deeply suspicious of such a warning
    myself (perhaps I used while(1) more than usual).

    > However, while
    > this code works great in most cases, it doesn't handle NULL strings and
    > just blows up.


    There are no such things as NULL strings. There are null (empty) strings,
    and there are null pointers, which are not any kind of string. If you mean
    that it doesn't handle null pointer arguments, well no, it doesn't; it
    does case-insensitive string compare, and NULL isn't a string. The user
    should not call it with null pointer arguments.

    (I can't think of a sensible answer to return for any null argument. If
    you really really want to guard for this case, just add to the top

    if (s1 == 0 || s2 == 0) return WHATEVERYOUWANT;

    or wrap it as

    if (s1 && s2) THEPREVIOUSCODE
    else return WHATEVERYOUWANT;
    )

    > int strcasecmp( const char *s1, const char *s2 )
    > {
    > int c1, c2;
    > for(;;)
    > {
    > c1 = tolower( (unsigned char) *s1++ );
    > c2 = tolower( (unsigned char) *s2++ );
    > if (c1 == 0 || c1 != c2)
    > return c1 - c2;
    > }
    > }


    --
    Chris "one-track" Dollin
    Capability does not imply necessity.
    Chris Dollin, Nov 10, 2005
    #5
  6. Chris Dollin wrote:
    > William Krick wrote:
    >
    > > However, while
    > > this code works great in most cases, it doesn't handle NULL strings and
    > > just blows up.

    >
    > There are no such things as NULL strings. There are null (empty) strings,
    > and there are null pointers, which are not any kind of string. If you mean
    > that it doesn't handle null pointer arguments, well no, it doesn't; it
    > does case-insensitive string compare, and NULL isn't a string. The user
    > should not call it with null pointer arguments.



    Point taken. This is my first foray back into C after being a Java
    programmer for 5 years. I admit I'm VERY rusty.

    I think Richard Tobin was right when he said...
    "Since when were the str* functions supposed to handle NULL?"

    I shouldn't be trying to handle null pointers.

    Thanks for your help everyone.
    William Krick, Nov 10, 2005
    #6
  7. William  Krick

    Skarmander Guest

    Chris Dollin wrote:
    > William Krick wrote:

    <snip>
    >>However, while
    >>this code works great in most cases, it doesn't handle NULL strings and
    >>just blows up.

    >
    >
    > There are no such things as NULL strings. There are null (empty) strings,
    > and there are null pointers, which are not any kind of string. If you mean
    > that it doesn't handle null pointer arguments, well no, it doesn't; it
    > does case-insensitive string compare, and NULL isn't a string. The user
    > should not call it with null pointer arguments.
    >
    > (I can't think of a sensible answer to return for any null argument. If
    > you really really want to guard for this case, just add to the top
    >
    > if (s1 == 0 || s2 == 0) return WHATEVERYOUWANT;
    >

    I'd prefer

    assert(s1 && s2);

    Or a (documented) redefinition of the semantics (e.g., treat 0 as "").

    You could use WHATEVERYOUWANT if you document either it or the fact that
    passing null pointers will yield an indeterminate value. Don't keep it
    under the hood, in any case.

    S.
    Skarmander, Nov 10, 2005
    #7
  8. William  Krick

    Alan Balmer Guest

    On Thu, 10 Nov 2005 17:27:47 +0000, Chris Dollin <>
    wrote:

    >> I also changed it from while(1)
    >> to for(;;) because the while was throwing a warning.

    >
    >Well, OK I suppose; but I'd be deeply suspicious of such a warning
    >myself (perhaps I used while(1) more than usual).


    In my experience, this warning is quite common. The warning is that
    the condition is always true. In more complex cases, it can be useful.

    I seem to remember that we had a fairly lengthy thread about this not
    long ago. Might have been a different NG.
    --
    Al Balmer
    Balmer Consulting
    Alan Balmer, Nov 10, 2005
    #8
  9. William  Krick

    pete Guest

    William Krick wrote:
    >
    > I'm currently evaluating two implementations of a case insensitive
    > string comparison function to replace the non-ANSI stricmp().
    > Both of
    > the implementations below seem to work fine
    > but I'm wondering if one is
    > better than the other or if there is some sort of hybrid of the two
    > that would be superior.


    This is what I use:

    int str_ccmp(const char *s1, const char *s2)
    {
    const unsigned char *p1 = (const unsigned char *)s1;
    const unsigned char *p2 = (const unsigned char *)s2;

    while (toupper(*p1) == toupper(*p2)) {
    if (*p1 == '\0') {
    return 0;
    }
    ++p1;
    ++p2;
    }
    return toupper(*p2) > toupper(*p1) ? -1 : 1;
    }

    --
    pete
    pete, Nov 10, 2005
    #9
  10. William  Krick

    Eric Sosman Guest

    pete wrote On 11/10/05 12:58,:
    > William Krick wrote:
    >
    >>I'm currently evaluating two implementations of a case insensitive
    >>string comparison function to replace the non-ANSI stricmp().
    >> Both of
    >>the implementations below seem to work fine
    >>but I'm wondering if one is
    >>better than the other or if there is some sort of hybrid of the two
    >>that would be superior.

    >
    >
    > This is what I use:
    >
    > int str_ccmp(const char *s1, const char *s2)
    > {
    > const unsigned char *p1 = (const unsigned char *)s1;
    > const unsigned char *p2 = (const unsigned char *)s2;


    These would be incorrect (or at the very least dubious)
    on signed-magnitude or ones' complement machines. In the
    Immortal Words (which someone reacently mis-attributed to
    me; they're by Henry Spencer): "If you lie to the compiler,
    it will have its revenge." The above are lies, so ...

    --
    Eric Sosman, Nov 10, 2005
    #10
  11. William  Krick

    Ben Pfaff Guest

    Eric Sosman <> writes:

    > pete wrote On 11/10/05 12:58,:
    >> int str_ccmp(const char *s1, const char *s2)
    >> {
    >> const unsigned char *p1 = (const unsigned char *)s1;
    >> const unsigned char *p2 = (const unsigned char *)s2;

    >
    > These would be incorrect (or at the very least dubious)
    > on signed-magnitude or ones' complement machines. [...]


    What problem do you have in mind here?
    --
    "It would be a much better example of undefined behavior
    if the behavior were undefined."
    --Michael Rubenstein
    Ben Pfaff, Nov 10, 2005
    #11
  12. William Krick wrote:
    > To make the code compatible with older C compilers, I had to move the
    > declaration of c1 & c2 up to the top. I also changed it from while(1)
    > to for(;;) because the while was throwing a warning.
    >
    > int strcasecmp( const char *s1, const char *s2 )
    > {
    > int c1, c2;
    > for(;;)
    > {
    > c1 = tolower( (unsigned char) *s1++ );
    > c2 = tolower( (unsigned char) *s2++ );
    > if (c1 == 0 || c1 != c2)
    > return c1 - c2;
    > }
    > }



    One final revision. I've modified the return statement so that it
    returns -1 / 0 / 1 to bring it in line with the behaviour of other
    similar functions...

    int str_ccmp( const char *s1, const char *s2 )
    {
    int c1, c2;
    for(;;)
    {
    c1 = tolower( (unsigned char) *s1++ );
    c2 = tolower( (unsigned char) *s2++ );
    if (c1 == 0 || c1 != c2)
    return c1 == c2 ? 0 : c1 > c2 ? 1 : -1;
    }
    }
    William Krick, Nov 10, 2005
    #12
  13. William  Krick

    Ben Pfaff Guest

    "William Krick" <> writes:

    [case-insensitive strcmp-like function]

    > I've modified the return statement so that it
    > returns -1 / 0 / 1 to bring it in line with the behaviour of other
    > similar functions...


    strcmp() isn't specified so strictly. You can't depend on it
    returning exactly -1 or 1. Here's what the standard says:

    3 The strcmp function returns an integer greater than, equal to,
    or less than zero, accordingly as the string pointed to by
    s1 is greater than, equal to, or less than the string
    pointed to by s2.

    --
    "Some programming practices beg for errors;
    this one is like calling an 800 number
    and having errors delivered to your door."
    --Steve McConnell
    Ben Pfaff, Nov 10, 2005
    #13
  14. William  Krick

    Skarmander Guest

    William Krick wrote:
    > William Krick wrote:
    >
    >>To make the code compatible with older C compilers, I had to move the
    >>declaration of c1 & c2 up to the top. I also changed it from while(1)
    >>to for(;;) because the while was throwing a warning.
    >>
    >>int strcasecmp( const char *s1, const char *s2 )
    >>{
    >> int c1, c2;
    >> for(;;)
    >> {
    >> c1 = tolower( (unsigned char) *s1++ );
    >> c2 = tolower( (unsigned char) *s2++ );
    >> if (c1 == 0 || c1 != c2)
    >> return c1 - c2;
    >> }
    >>}

    >
    >
    >
    > One final revision. I've modified the return statement so that it
    > returns -1 / 0 / 1 to bring it in line with the behaviour of other
    > similar functions...
    >
    > int str_ccmp( const char *s1, const char *s2 )
    > {
    > int c1, c2;
    > for(;;)
    > {
    > c1 = tolower( (unsigned char) *s1++ );
    > c2 = tolower( (unsigned char) *s2++ );
    > if (c1 == 0 || c1 != c2)
    > return c1 == c2 ? 0 : c1 > c2 ? 1 : -1;
    > }
    > }
    >


    Aww, you ruined it. What next? Make it a do-while with a neat return at
    the end?

    S.
    Skarmander, Nov 10, 2005
    #14
  15. William  Krick

    Eric Sosman Guest

    Ben Pfaff wrote On 11/10/05 13:49,:
    > Eric Sosman <> writes:
    >
    >
    >>pete wrote On 11/10/05 12:58,:
    >>
    >>>int str_ccmp(const char *s1, const char *s2)
    >>>{
    >>> const unsigned char *p1 = (const unsigned char *)s1;
    >>> const unsigned char *p2 = (const unsigned char *)s2;

    >>
    >> These would be incorrect (or at the very least dubious)
    >>on signed-magnitude or ones' complement machines. [...]

    >
    >
    > What problem do you have in mind here?


    char c = -1;
    unsigned char *puc = (unsigned char*)&c;
    printf ("%d ?= %d\n", (unsigned char)c, *puc);

    Expected output (assuming 8-bit characters):

    255 ?= 255 (two's complement)
    255 ?= 129 (signed magnitude)
    255 ?= 254 (ones' complement)

    For the <ctype.h> functions, the argument corresponding to
    the `char' whose value is -1 is `(int)UCHAR_MAX', always,
    or 255 in the three cases above. Conversion from signed to
    unsigned can involve more than just reinterpreting the bits.

    --
    Eric Sosman, Nov 10, 2005
    #15
  16. William  Krick

    pete Guest

    Eric Sosman wrote:
    >
    > Ben Pfaff wrote On 11/10/05 13:49,:
    > > Eric Sosman <> writes:
    > >
    > >
    > >>pete wrote On 11/10/05 12:58,:
    > >>
    > >>>int str_ccmp(const char *s1, const char *s2)
    > >>>{
    > >>> const unsigned char *p1 = (const unsigned char *)s1;
    > >>> const unsigned char *p2 = (const unsigned char *)s2;
    > >>
    > >> These would be incorrect (or at the very least dubious)
    > >>on signed-magnitude or ones' complement machines. [...]


    I disagree.

    > For the <ctype.h> functions, the argument corresponding to
    > the `char' whose value is -1 is `(int)UCHAR_MAX', always,
    > or 255 in the three cases above. Conversion from signed to
    > unsigned can involve more than just reinterpreting the bits.


    The string functions are relevant here,
    in particular, the comparison functions.

    N869
    7.21.4 Comparison functions
    [#1] The sign of a nonzero value returned by the comparison
    functions memcmp, strcmp, and strncmp is determined by the
    sign of the difference between the values of the first pair
    of characters (both interpreted as unsigned char) that
    differ in the objects being compared.

    --
    pete
    pete, Nov 10, 2005
    #16
  17. William  Krick

    Eric Sosman Guest

    pete wrote On 11/10/05 15:30,:
    > Eric Sosman wrote:
    >
    >>Ben Pfaff wrote On 11/10/05 13:49,:
    >>
    >>>Eric Sosman <> writes:
    >>>
    >>>
    >>>
    >>>>pete wrote On 11/10/05 12:58,:
    >>>>
    >>>>
    >>>>>int str_ccmp(const char *s1, const char *s2)
    >>>>>{
    >>>>> const unsigned char *p1 = (const unsigned char *)s1;
    >>>>> const unsigned char *p2 = (const unsigned char *)s2;
    >>>>
    >>>> These would be incorrect (or at the very least dubious)
    >>>>on signed-magnitude or ones' complement machines. [...]

    >
    >
    > I disagree.
    >
    >
    >>For the <ctype.h> functions, the argument corresponding to
    >>the `char' whose value is -1 is `(int)UCHAR_MAX', always,
    >>or 255 in the three cases above. Conversion from signed to
    >>unsigned can involve more than just reinterpreting the bits.

    >
    >
    > The string functions are relevant here,
    > in particular, the comparison functions.
    >
    > N869
    > 7.21.4 Comparison functions
    > [#1] The sign of a nonzero value returned by the comparison
    > functions memcmp, strcmp, and strncmp is determined by the
    > sign of the difference between the values of the first pair
    > of characters (both interpreted as unsigned char) that
    > differ in the objects being compared.


    Well, that's why I said "or at the very least dubious."
    The characters being plucked from the string are handed as
    arguments to tolower(), and the only requirement I can find
    on the <ctype.h> argument values is 7.4/1:

    "[...] the value of which shall be representable as
    an unsigned char or shall equal the value of the
    macro EOF."

    The Standard is not entirely clear about what should
    be done with negative `char' values. We know they need to
    be made non-negative to become `unsigned char' values, but
    is this to be done by conversion (my assumption) or by
    reinterpretation (yours)? Nothing I can find in the Standard
    or in the Rationale seems to shed any light. Hence "dubious"
    rather than an unqualified "incorrect."

    --
    Eric Sosman, Nov 10, 2005
    #17
  18. Eric Sosman wrote:
    > ...
    > The Standard is not entirely clear about what should
    > be done with negative `char' values. We know they need to
    > be made non-negative to become `unsigned char' values, but
    > is this to be done by conversion (my assumption) or by
    > reinterpretation (yours)? Nothing I can find in the Standard
    > or in the Rationale seems to shed any light. Hence "dubious"
    > rather than an unqualified "incorrect."


    Some previous queries by myself on the issue...

    http://groups.google.com/group/comp...2c290cf1f7/346b9c5072670cfd?#346b9c5072670cfd

    http://groups.google.com/group/comp...f4278ffe943/c49809d907620f55#c49809d907620f55

    --
    Peter
    Peter Nilsson, Nov 10, 2005
    #18
  19. William  Krick

    Jordan Abel Guest

    On 2005-11-10, William Krick <> wrote:
    >
    > William Krick wrote:
    >> To make the code compatible with older C compilers, I had to move the
    >> declaration of c1 & c2 up to the top. I also changed it from while(1)
    >> to for(;;) because the while was throwing a warning.
    >>
    >> int strcasecmp( const char *s1, const char *s2 )
    >> {
    >> int c1, c2;
    >> for(;;)
    >> {
    >> c1 = tolower( (unsigned char) *s1++ );
    >> c2 = tolower( (unsigned char) *s2++ );
    >> if (c1 == 0 || c1 != c2)
    >> return c1 - c2;
    >> }
    >> }

    >
    >
    > One final revision. I've modified the return statement so that it
    > returns -1 / 0 / 1 to bring it in line with the behaviour of other
    > similar functions...


    Such as which ones?

    strcmp("foo","bar") returns 4 on my system, and probably yours. Only a
    positive value is required by the standard in that case.
    Jordan Abel, Nov 10, 2005
    #19
  20. William  Krick

    pete Guest

    Eric Sosman wrote:
    >
    > pete wrote On 11/10/05 15:30,:


    > > of characters (both interpreted as unsigned char)


    > The Standard is not entirely clear about what should
    > be done with negative `char' values. We know they need to
    > be made non-negative to become `unsigned char' values, but
    > is this to be done by conversion (my assumption) or by
    > reinterpretation (yours)?


    I'm seeing "interpreted as unsigned char"
    in the above quote from the standard.
    *(unsigned char *)byte, is the what "interpreted as" means.
    The standard isn't shy about using the word "converted".

    --
    pete
    pete, Nov 10, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jamihuq
    Replies:
    18
    Views:
    1,718
  2. xuatla
    Replies:
    11
    Views:
    730
    Robbie Hatley
    Sep 26, 2005
  3. Michael Sgier

    stricmp

    Michael Sgier, Jul 6, 2006, in forum: C++
    Replies:
    2
    Views:
    628
    Walt Stoneburner
    Jul 7, 2006
  4. ahso
    Replies:
    4
    Views:
    4,283
    BGB / cr88192
    Nov 25, 2009
  5. Lynn McGuire
    Replies:
    11
    Views:
    484
    James Kanze
    Apr 30, 2013
Loading...

Share This Page