Question on strncmp / strnicmp use

C

C. J. Clegg

I have two null-terminated character strings, str1 and str2. I don't
know which one is longer, but I want to know if the shortest one
(whichever one that might be) matches the first part of the longer
one.

I can say:

if ( strncmp( str1, str2, strlen( str2 ) ) == 0 ||
strncmp( str2, str1, strlen( str1 ) ) == 0 )

but that seems a bit kludgy and inelegant.

Is there a better way?
 
B

Bartc

C. J. Clegg said:
I have two null-terminated character strings, str1 and str2. I don't
know which one is longer, but I want to know if the shortest one
(whichever one that might be) matches the first part of the longer
one.

I can say:

if ( strncmp( str1, str2, strlen( str2 ) ) == 0 ||
strncmp( str2, str1, strlen( str1 ) ) == 0 )

but that seems a bit kludgy and inelegant.

I'm not familiar with strncmp, but it seems to stop comparing at a null, so
perhaps you only need one of those strncmp() calls? Either length will work.
 
V

vippstar

I have two null-terminated character strings, str1 and str2.  I don't
know which one is longer, but I want to know if the shortest one
(whichever one that might be) matches the first part of the longer
one.

I can say:

if ( strncmp( str1, str2, strlen( str2 ) ) == 0 ||
     strncmp( str2, str1, strlen( str1 ) )  == 0 )

but that seems a bit kludgy and inelegant.

Is there a better way?

Writing your own function is the most efficient way, and also a good
solution.
It's more efficient than strncmp, because for strncmp you have to pass
the length too, which you usually compute with strlen, which means you
iterate the string one more time than needed.

int mycomparison(const char *s1, const char *s2) {
for(; *s1 == *s2; s1++, s2++);
if(*s1 != *s2 && (*s1 == 0 || *s2 == 0)) return 0;
if((unsigned char)*s1 < *s2) return -1; else return (unsigned char)
*s1 > *s2;
}
 
V

vippstar

Writing your own function is the most efficient way, and also a good
solution.
It's more efficient than strncmp, because for strncmp you have to pass
the length too, which you usually compute with strlen, which means you
iterate the string one more time than needed.

int mycomparison(const char *s1, const char *s2) {
    for(; *s1 == *s2; s1++, s2++);
    if(*s1 != *s2 && (*s1 == 0 || *s2 == 0)) return 0;
    if((unsigned char)*s1 < *s2) return -1; else return (unsigned char)
*s1 > *s2;

}

Another solution:

if(strstr(p, q) == p)
 
B

Ben Bacarisse

Writing your own function is the most efficient way, and also a good
solution.
It's more efficient than strncmp, because for strncmp you have to pass
the length too, which you usually compute with strlen, which means you
iterate the string one more time than needed.

Very similar tests are need when one writes this function. You have
to test the characters from both strings and increment two pointers.
strlen might well be kinder on the cache and might use a single
machine instruction. Anyway, my point is that I, for one, am
constantly amazed by how often my intuition about efficiency wrong.
This means that you could well be right, of course!

To the OP: write clearly, then measure if you have time or need to
optimise.
int mycomparison(const char *s1, const char *s2) {
for(; *s1 == *s2; s1++, s2++);

This loop has UB. If the strings are the same length it starts to
look beyond the nulls. Of course, this may be well-defined but, in
general, it is not safe.
if(*s1 != *s2 && (*s1 == 0 || *s2 == 0)) return 0;

Logically, the != test is not needed. There is only one why for the
previous loop to finish and that is with *s1 != *s2. Of course, the
UB means that such points are moot.
 
B

Bartc

Bartc said:
I'm not familiar with strncmp, but it seems to stop comparing at a null,
so perhaps you only need one of those strncmp() calls? Either length will
work.

Hmm, the wording of the strncmp() function in the C99 spec suggests that the
extra characters of the longer string are ignored. That's not correct.

You still just need a single call to strncmp(), but need to find min() of
the two lengths first.
 
R

Richard

Writing your own function is the most efficient way, and also a good
solution.
It's more efficient than strncmp, because for strncmp you have to pass
the length too, which you usually compute with strlen, which means you
iterate the string one more time than needed.

int mycomparison(const char *s1, const char *s2) {
for(; *s1 == *s2; s1++, s2++);
if(*s1 != *s2 && (*s1 == 0 || *s2 == 0)) return 0;
if((unsigned char)*s1 < *s2) return -1; else return (unsigned char)
*s1 > *s2;
}

In C there are a few basic tests one does on each and every function
involving strings.

And strings the same length would be one such.

Your function would probably crash.
 
V

vippstar

Very similar tests are need when one writes this function.  You have
to test the characters from both strings and increment two pointers.
strlen might well be kinder on the cache and might use a single
machine instruction.  Anyway, my point is that I, for one, am
constantly amazed by how often my intuition about efficiency wrong.
This means that you could well be right, of course!

Here's what I have in mind:

strlen can't be less than O(n) (optimizations that, say, check 4
elements at once are still O(n))
mycomparison can't be less than O(n).

Using just mycomparison instead of strlen + mycomparison is one less O
(n) function called, so in that way, I'm correct.
You mention increment of pointers and comparison, but those operations
are O(1).

All that is not supported by the standard which doesn't even mention
the concept of efficiency/time of operation, but I hope you can see
what I'm trying to say.
This loop has UB.  If the strings are the same length it starts to
look beyond the nulls.  Of course, this may be well-defined but, in
general, it is not safe.

Yes, you are right. Fix: (I hope)

for(; *s1 == *s2; s1++, s2++) if(*s1 == 0) break;
 
B

Ben Bacarisse

Bartc said:
I'm not familiar with strncmp, but it seems to stop comparing at a
null, so perhaps you only need one of those strncmp() calls? Either
length will work.

Nope. The OP can get away with one strncmp and two strlens:

int isprefix(const char *str1, const char *str2)
{
size_t l1 = strlen(str1), l2 = strlen(str2);
return strncmp(str1, str2, l1 > l2 ? l2 : l1) == 0;
}

or he/she can roll his/her own but that is error prone. My go would
be:

int isprefix(const char *str1, const char *str2)
{
while (*str1 && *str1 == *str2) str1++, str2++;
return !*str1 || !*str2;
}
 
V

vippstar

        char *p = "Hello, world!";
        char *q = "Hell";

I think you got the order of the strings wrong, but I see what you
mean. I goofed again. :p
 
B

Ben Bacarisse

Here's what I have in mind:

strlen can't be less than O(n) (optimizations that, say, check 4
elements at once are still O(n))
mycomparison can't be less than O(n).

Using just mycomparison instead of strlen + mycomparison is one less O
(n) function called, so in that way, I'm correct.
You mention increment of pointers and comparison, but those operations
are O(1).

All that is not supported by the standard which doesn't even mention
the concept of efficiency/time of operation, but I hope you can see
what I'm trying to say.

Yes I can, and you may be right. I hope you can see what I am trying
to say, which is that it is not wise to speculate about efficiency
unless the case is an obvious one. There may, for example, be no
calls to strlen.

In terms of Big O, a hand-rolled function can be O(min(n, m)) whereas
the strlen-based solution is O(n + m). I agree that it looks good for
a hand-coded loop but I have been caught out before trying to out-code
the C library.
 
B

Ben Bacarisse

Richard said:
(e-mail address removed) writes:

In C there are a few basic tests one does on each and every function
involving strings.

And strings the same length would be one such.

I don't see a problem caused by strings of the same length.
Your function would probably crash.

If you pick strings of the same length with the required extra
property you may get lucky, but this is a case where testing could
easily fail to show the problem.
 
B

Barry Schwarz

Here's what I have in mind:

strlen can't be less than O(n) (optimizations that, say, check 4
elements at once are still O(n))
mycomparison can't be less than O(n).

Using just mycomparison instead of strlen + mycomparison is one less O
(n) function called, so in that way, I'm correct.
You mention increment of pointers and comparison, but those operations
are O(1).

All that is not supported by the standard which doesn't even mention
the concept of efficiency/time of operation, but I hope you can see
what I'm trying to say.

As an abstract theoretical exercise, maybe you are right. But I
think your frame of reference may be restricted by your experience
with a limited number of systems.

There are machines where strlen can be implemented as a single
instruction. Under the as if rule, the compiler need not even
generate a function call.

The same is true regarding strncmp. Furthermore, these machines can
allow both strings to populate cache.

Standards seem to have a life of 10+ years while hardware development
proceeds at a much faster rate. Since the manufacturer of one such
machine is on the standards committee (and made these optimizations
primarily to support C and C++), this may one reason the standard
omits such considerations.
 
B

Barry Schwarz

In C there are a few basic tests one does on each and every function
involving strings.

And strings the same length would be one such.

Your function would probably crash.

Why do you think that? I don't see any undefined behavior if the
strings are well formed.
 
B

Barry Schwarz

Writing your own function is the most efficient way, and also a good
solution.
It's more efficient than strncmp, because for strncmp you have to pass
the length too, which you usually compute with strlen, which means you
iterate the string one more time than needed.

int mycomparison(const char *s1, const char *s2) {
for(; *s1 == *s2; s1++, s2++);
if(*s1 != *s2 && (*s1 == 0 || *s2 == 0)) return 0;
if((unsigned char)*s1 < *s2) return -1; else return (unsigned char)
*s1 > *s2;

Why go through a third comparison between *s1 and *s2 when there is
only one possible result?
 
V

vippstar

As an abstract theoretical exercise, maybe you are right.   But I
think your frame of reference may be restricted by your experience
with a limited number of systems.

There are machines where strlen can be implemented as a single
instruction.  Under the as if rule, the compiler need not even
generate a function call.

The same is true regarding strncmp.  Furthermore, these machines can
allow both strings to populate cache.

Standards seem to have a life of 10+ years while hardware development
proceeds at a much faster rate.  Since the manufacturer of one such
machine is on the standards committee (and made these optimizations
primarily to support C and C++), this may one reason the standard
omits such considerations.

Thanks, with your explanation now I see the point Ben Bacarisse made,
and I agree with both.
 
V

vippstar

Why go through a third comparison between *s1 and *s2 when there is
only one possible result?

It seems I'm unable to write code without bugs today. In my defense, I
have an annoying fever.
Ben Bacarisse posted what I tried to implement here:
Message-ID: <[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top