Question on strncmp / strnicmp use

C. J. Clegg · Jan 16, 2009

I have two null-terminated character strings, str1 and str2. I don't
know which one is longer, but I want to know if the shortest one
(whichever one that might be) matches the first part of the longer
one.

I can say:

if ( strncmp( str1, str2, strlen( str2 ) ) == 0 ||
strncmp( str2, str1, strlen( str1 ) ) == 0 )

but that seems a bit kludgy and inelegant.

Is there a better way?

Bartc · Jan 16, 2009

C. J. Clegg said:
I have two null-terminated character strings, str1 and str2. I don't
know which one is longer, but I want to know if the shortest one
(whichever one that might be) matches the first part of the longer
one.

I can say:

if ( strncmp( str1, str2, strlen( str2 ) ) == 0 ||
strncmp( str2, str1, strlen( str1 ) ) == 0 )

but that seems a bit kludgy and inelegant.

I'm not familiar with strncmp, but it seems to stop comparing at a null, so
perhaps you only need one of those strncmp() calls? Either length will work.

vippstar · Jan 16, 2009

I have two null-terminated character strings, str1 and str2. I don't
know which one is longer, but I want to know if the shortest one
(whichever one that might be) matches the first part of the longer
one.

I can say:

if ( strncmp( str1, str2, strlen( str2 ) ) == 0 ||
strncmp( str2, str1, strlen( str1 ) ) == 0 )

but that seems a bit kludgy and inelegant.

Is there a better way?

Writing your own function is the most efficient way, and also a good
solution.
It's more efficient than strncmp, because for strncmp you have to pass
the length too, which you usually compute with strlen, which means you
iterate the string one more time than needed.

int mycomparison(const char *s1, const char *s2) {
for(; *s1 == *s2; s1++, s2++);
if(*s1 != *s2 && (*s1 == 0 || *s2 == 0)) return 0;
if((unsigned char)*s1 < *s2) return -1; else return (unsigned char)
*s1 > *s2;
}

vippstar · Jan 16, 2009

Writing your own function is the most efficient way, and also a good
solution.
It's more efficient than strncmp, because for strncmp you have to pass
the length too, which you usually compute with strlen, which means you
iterate the string one more time than needed.

int mycomparison(const char *s1, const char *s2) {
for(; *s1 == *s2; s1++, s2++);
if(*s1 != *s2 && (*s1 == 0 || *s2 == 0)) return 0;
if((unsigned char)*s1 < *s2) return -1; else return (unsigned char)
*s1 > *s2;

}

Another solution:

if(strstr(p, q) == p)

Ben Bacarisse · Jan 16, 2009

Writing your own function is the most efficient way, and also a good
solution.
It's more efficient than strncmp, because for strncmp you have to pass
the length too, which you usually compute with strlen, which means you
iterate the string one more time than needed.

Very similar tests are need when one writes this function. You have
to test the characters from both strings and increment two pointers.
strlen might well be kinder on the cache and might use a single
machine instruction. Anyway, my point is that I, for one, am
constantly amazed by how often my intuition about efficiency wrong.
This means that you could well be right, of course!

To the OP: write clearly, then measure if you have time or need to
optimise.

int mycomparison(const char *s1, const char *s2) {
for(; *s1 == *s2; s1++, s2++);

This loop has UB. If the strings are the same length it starts to
look beyond the nulls. Of course, this may be well-defined but, in
general, it is not safe.

if(*s1 != *s2 && (*s1 == 0 || *s2 == 0)) return 0;

Logically, the != test is not needed. There is only one why for the
previous loop to finish and that is with *s1 != *s2. Of course, the
UB means that such points are moot.

Bartc · Jan 16, 2009

Bartc said:
I'm not familiar with strncmp, but it seems to stop comparing at a null,
so perhaps you only need one of those strncmp() calls? Either length will
work.

Hmm, the wording of the strncmp() function in the C99 spec suggests that the
extra characters of the longer string are ignored. That's not correct.

You still just need a single call to strncmp(), but need to find min() of
the two lengths first.

Richard · Jan 16, 2009

Writing your own function is the most efficient way, and also a good
solution.
It's more efficient than strncmp, because for strncmp you have to pass
the length too, which you usually compute with strlen, which means you
iterate the string one more time than needed.

int mycomparison(const char *s1, const char *s2) {
for(; *s1 == *s2; s1++, s2++);
if(*s1 != *s2 && (*s1 == 0 || *s2 == 0)) return 0;
if((unsigned char)*s1 < *s2) return -1; else return (unsigned char)
*s1 > *s2;
}

In C there are a few basic tests one does on each and every function
involving strings.

And strings the same length would be one such.

Your function would probably crash.

vippstar · Jan 16, 2009

Very similar tests are need when one writes this function. You have
to test the characters from both strings and increment two pointers.
strlen might well be kinder on the cache and might use a single
machine instruction. Anyway, my point is that I, for one, am
constantly amazed by how often my intuition about efficiency wrong.
This means that you could well be right, of course!

Here's what I have in mind:

strlen can't be less than O(n) (optimizations that, say, check 4
elements at once are still O(n))
mycomparison can't be less than O(n).

Using just mycomparison instead of strlen + mycomparison is one less O
(n) function called, so in that way, I'm correct.
You mention increment of pointers and comparison, but those operations
are O(1).

All that is not supported by the standard which doesn't even mention
the concept of efficiency/time of operation, but I hope you can see
what I'm trying to say.

This loop has UB. If the strings are the same length it starts to
look beyond the nulls. Of course, this may be well-defined but, in
general, it is not safe.

Yes, you are right. Fix: (I hope)

for(; *s1 == *s2; s1++, s2++) if(*s1 == 0) break;

Ben Bacarisse · Jan 16, 2009

Bartc said:
I'm not familiar with strncmp, but it seems to stop comparing at a
null, so perhaps you only need one of those strncmp() calls? Either
length will work.

Nope. The OP can get away with one strncmp and two strlens:

int isprefix(const char *str1, const char *str2)
{
size_t l1 = strlen(str1), l2 = strlen(str2);
return strncmp(str1, str2, l1 > l2 ? l2 : l1) == 0;
}

or he/she can roll his/her own but that is error prone. My go would
be:

int isprefix(const char *str1, const char *str2)
{
while (*str1 && *str1 == *str2) str1++, str2++;
return !*str1 || !*str2;
}

vippstar · Jan 16, 2009

char *p = "Hello, world!";
char *q = "Hell";

I think you got the order of the strings wrong, but I see what you
mean. I goofed again.

Richard · Jan 16, 2009

I think you got the order of the strings wrong, but I see what you
mean. I goofed again.

He didn't and you did again.

Ben Bacarisse · Jan 16, 2009

Here's what I have in mind:

strlen can't be less than O(n) (optimizations that, say, check 4
elements at once are still O(n))
mycomparison can't be less than O(n).

Using just mycomparison instead of strlen + mycomparison is one less O
(n) function called, so in that way, I'm correct.
You mention increment of pointers and comparison, but those operations
are O(1).

All that is not supported by the standard which doesn't even mention
the concept of efficiency/time of operation, but I hope you can see
what I'm trying to say.

Yes I can, and you may be right. I hope you can see what I am trying
to say, which is that it is not wise to speculate about efficiency
unless the case is an obvious one. There may, for example, be no
calls to strlen.

In terms of Big O, a hand-rolled function can be O(min(n, m)) whereas
the strlen-based solution is O(n + m). I agree that it looks good for
a hand-coded loop but I have been caught out before trying to out-code
the C library.

Ben Bacarisse · Jan 16, 2009

Richard said:
(e-mail address removed) writes:

In C there are a few basic tests one does on each and every function
involving strings.

And strings the same length would be one such.

I don't see a problem caused by strings of the same length.

Your function would probably crash.

If you pick strings of the same length with the required extra
property you may get lucky, but this is a case where testing could
easily fail to show the problem.

Ben Bacarisse · Jan 16, 2009

Richard said:
He didn't and you did again.

He did and now you have. My turn...

Barry Schwarz · Jan 16, 2009

Here's what I have in mind:

strlen can't be less than O(n) (optimizations that, say, check 4
elements at once are still O(n))
mycomparison can't be less than O(n).

Using just mycomparison instead of strlen + mycomparison is one less O
(n) function called, so in that way, I'm correct.
You mention increment of pointers and comparison, but those operations
are O(1).

All that is not supported by the standard which doesn't even mention
the concept of efficiency/time of operation, but I hope you can see
what I'm trying to say.

As an abstract theoretical exercise, maybe you are right. But I
think your frame of reference may be restricted by your experience
with a limited number of systems.

There are machines where strlen can be implemented as a single
instruction. Under the as if rule, the compiler need not even
generate a function call.

The same is true regarding strncmp. Furthermore, these machines can
allow both strings to populate cache.

Standards seem to have a life of 10+ years while hardware development
proceeds at a much faster rate. Since the manufacturer of one such
machine is on the standards committee (and made these optimizations
primarily to support C and C++), this may one reason the standard
omits such considerations.

Barry Schwarz · Jan 16, 2009

In C there are a few basic tests one does on each and every function
involving strings.

And strings the same length would be one such.

Your function would probably crash.

Why do you think that? I don't see any undefined behavior if the
strings are well formed.

Barry Schwarz · Jan 16, 2009

Writing your own function is the most efficient way, and also a good
solution.
It's more efficient than strncmp, because for strncmp you have to pass
the length too, which you usually compute with strlen, which means you
iterate the string one more time than needed.

int mycomparison(const char *s1, const char *s2) {
for(; *s1 == *s2; s1++, s2++);
if(*s1 != *s2 && (*s1 == 0 || *s2 == 0)) return 0;
if((unsigned char)*s1 < *s2) return -1; else return (unsigned char)
*s1 > *s2;

Why go through a third comparison between *s1 and *s2 when there is
only one possible result?

vippstar · Jan 16, 2009

As an abstract theoretical exercise, maybe you are right. But I
think your frame of reference may be restricted by your experience
with a limited number of systems.

There are machines where strlen can be implemented as a single
instruction. Under the as if rule, the compiler need not even
generate a function call.

The same is true regarding strncmp. Furthermore, these machines can
allow both strings to populate cache.

Standards seem to have a life of 10+ years while hardware development
proceeds at a much faster rate. Since the manufacturer of one such
machine is on the standards committee (and made these optimizations
primarily to support C and C++), this may one reason the standard
omits such considerations.

Thanks, with your explanation now I see the point Ben Bacarisse made,
and I agree with both.

vippstar · Jan 16, 2009

Why go through a third comparison between *s1 and *s2 when there is
only one possible result?

It seems I'm unable to write code without bugs today. In my defense, I
have an annoying fever.
Ben Bacarisse posted what I tried to implement here:
Message-ID: <[email protected]>

Richard · Jan 16, 2009

Ben Bacarisse said:
He did and now you have. My turn...

No.. My turn. And sorry, my humour is not the best at times..

String constants	8	Jul 31, 2006
Reversing order of words in a given string	2	Apr 26, 2007
strcmp/strncmp/strnicmp	1	Mar 29, 2005
Can someone tell me what's wrong with this question on StackOverflow?	0	Aug 19, 2023
regexp or array?	10	Feb 18, 2009
Fixing escaped characters python-xbee	4	Apr 24, 2013
Void pointers	8	Jun 21, 2009
Pascal - C (2)	57	Nov 1, 2008

Question on strncmp / strnicmp use

C. J. Clegg

Bartc

vippstar

vippstar

Ben Bacarisse

Bartc

Richard

vippstar

Ben Bacarisse

vippstar

Richard

Ben Bacarisse

Ben Bacarisse

Ben Bacarisse

Barry Schwarz

Barry Schwarz

Barry Schwarz

vippstar

vippstar

Richard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads