Is this a correct implementation of strstr ?

C

candide

I request your opinion about the following attempt to implement the
standard function strstr. Here is the code :



char *mystrchr(const char *s, int c)
{
while ((unsigned char)*s != (unsigned char)c) {
if (*s == '\0')
return NULL;
++s;
}
return (char*)s;
}
 
I

Ian Collins

I request your opinion about the following attempt to implement the
standard function strstr. Here is the code :

No, it looks like you are writing strchr.
char *mystrchr(const char *s, int c)
{
while ((unsigned char)*s != (unsigned char)c) {

Why the casts?
 
S

Seebs

I request your opinion about the following attempt to implement the
standard function strstr. Here is the code :
char *mystrchr(const char *s, int c)
{
while ((unsigned char)*s != (unsigned char)c) {
if (*s == '\0')
return NULL;
++s;
}
return (char*)s;
}

I do not think this is a correct implementation of the standard
function str*str*. Apart from that, it looks basically correct
to me, although that could just be heatstroke or something, I went
out in the Big Blue Room, and was exposed for nearly an hour to
an essentially uncontrolled fusion reaction. :(

-s
 
P

Phil Carmody

Ian Collins said:
No, it looks like you are writing strchr.

Why the casts?

Because without them

char buff[LINELEN];
while(get_line_from_post(buff, sizeof(buff))) {
/* detect mangled indents from googoogroups */
char *mangledchars=buff;
while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
*mangledchars=' ';
}

may fail to fix any non-breaking spaces.
 
C

candide

Ian Collins a écrit :
No, it looks like you are writing strchr.

Yes, sorry for the misprint.


Why the casts?

Why not ;) ?



In fact, the code is not mine so I can't tell you why. I suppose the
casts are needed for the function to have a correct behaviour in order
to locate some character in the extended character set (accent for
instance).
 
I

Ian Collins

Ian Collins said:
No, it looks like you are writing strchr.

Why the casts?

Because without them

char buff[LINELEN];
while(get_line_from_post(buff, sizeof(buff))) {
/* detect mangled indents from googoogroups */
char *mangledchars=buff;
while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
*mangledchars=' ';
}

may fail to fix any non-breaking spaces.

Eh?
 
D

Dann Corbit

I request your opinion about the following attempt to implement the
standard function strstr. Here is the code :

char *mystrchr(const char *s, int c)
{
while ((unsigned char)*s != (unsigned char)c) {
if (*s == '\0')
return NULL;
++s;
}
return (char*)s;
}

The casts add nothing.

This seems to be a user implementation of strchr() and not strstr().

With strstr() you are looking for a sequence of characters that match
and not a single character.

As long as you are writing a strstr() implementation, I suggest the BMH
algorithm.

P.S.
Is this one of those:

PARIS IN THE
THE SPRING

jokes?
 
P

Phil Carmody

Ian Collins said:
Ian Collins said:
On 04/14/10 09:01 AM, candide wrote:
I request your opinion about the following attempt to implement the
standard function strstr. Here is the code :

No, it looks like you are writing strchr.


char *mystrchr(const char *s, int c)
{
while ((unsigned char)*s != (unsigned char)c) {

Why the casts?

Because without them

char buff[LINELEN];
while(get_line_from_post(buff, sizeof(buff))) {
/* detect mangled indents from googoogroups */
char *mangledchars=buff;
while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
*mangledchars=' ';
}

may fail to fix any non-breaking spaces.

Eh?

If chars are signed, then you'll be comparing char -96 with int 160,
and not find any matches.

Phil
 
C

candide

Dann Corbit a écrit :
The casts add nothing.

Does it mean the casts don't hurt ? In other words : may the casts lead
to a wrong behaviour ?

This seems to be a user implementation of strchr() and not strstr().



I meant strchr and not strstr. Sorry for the misprint.
 
E

Eric Sosman

Ian Collins said:
On 04/14/10 09:01 AM, candide wrote:
I request your opinion about the following attempt to implement the
standard function strstr. Here is the code :

No, it looks like you are writing strchr.


char *mystrchr(const char *s, int c)
{
while ((unsigned char)*s != (unsigned char)c) {

Why the casts?

Because without them

char buff[LINELEN];
while(get_line_from_post(buff, sizeof(buff))) {
/* detect mangled indents from googoogroups */
char *mangledchars=buff;
while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
*mangledchars=' ';
}

may fail to fix any non-breaking spaces.

Eh?

If chars are signed, then you'll be comparing char -96 with int 160,
and not find any matches.

Yabbut, it's still wrong. The second argument to strchr()
is an int which is "converted to a char" (7.21.5.2p2), not to
an unsigned char. So the given code behaves differently from
the Standard's description for any char value C and int value I
such that

(C == (char)I) != ((unsigned char)C == (unsigned char)I)

Can such a C,I pair exist? Yes, certainly, if I is less than
CHAR_MIN or greater than CHAR_MAX, because then (char)I is ill-
defined and can produce a peculiar result (6.3.1.3p2). That
peculiar result need not be closely related to the (well-defined)
result of converting to unsigned char, so the two comparisons
could come out differently -- in which case, the fix is to remove
the `unsigned' from both casts and allow the peculiar behavior
to occur. It's what the Standard requires.
 
K

Keith Thompson

Dann Corbit said:
The casts add nothing.

I don't think that's true, though the second cast is probably
unnecessary.

Assume plain char is signed, with CHAR_BIT==8. Consider
*s == -96
c == 160
Without the casts, *s is promoted from plain char to int, with the
value -96, which is not equal to the int value 160. Converting the
char value -96 to unsigned char yields 160, which correctly promotes
to the value 160 of type int.

Interestingly the Standard's description of strchr() requires c, the
int argument to be converted to a char. Typically, given the above
assumptions, the int value 160 will be converted to the char value
-96, but this isn't actually guaranteed.
This seems to be a user implementation of strchr() and not strstr().

That was already acknowledged.

[snip]
 
I

Ian Collins

Ian Collins said:
On 04/14/10 09:01 AM, candide wrote:
I request your opinion about the following attempt to implement the
standard function strstr. Here is the code :

No, it looks like you are writing strchr.


char *mystrchr(const char *s, int c)
{
while ((unsigned char)*s != (unsigned char)c) {

Why the casts?

Because without them

char buff[LINELEN];
while(get_line_from_post(buff, sizeof(buff))) {
/* detect mangled indents from googoogroups */
char *mangledchars=buff;
while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
*mangledchars=' ';
}

may fail to fix any non-breaking spaces.

Eh?

If chars are signed, then you'll be comparing char -96 with int 160,
and not find any matches.
So? a simple cast to char would be fine:

while (*s != (char)c)

As the standard says (my emphasis):

"The strchr function locates the first occurrence of c *(converted to a
char)* in string s".
 
S

Seebs

In fact, the code is not mine so I can't tell you why. I suppose the
casts are needed for the function to have a correct behaviour in order
to locate some character in the extended character set (accent for
instance).

I don't think they are, actually. The "unsigned char" thing is needed
when using is*() functions, and for processing things like the result of
getchar(), etcetera, but in the case of strchr, it seems as though plain
char is right.

-s
 
P

Peter Nilsson

Phil Carmody said:
If chars are signed, then you'll be comparing char -96 with
int 160, and not find any matches.

The standard requires c to be converted to a char for strchr.
In which case, strchr effectively invokes undefined behaviour
(in C99) if char is signed and c is not in the range of signed
char.
 
P

Phil Carmody

Eric Sosman said:
Ian Collins said:
On 04/14/10 09:19 AM, Phil Carmody wrote:
On 04/14/10 09:01 AM, candide wrote:
I request your opinion about the following attempt to implement the
standard function strstr. Here is the code :

No, it looks like you are writing strchr.


char *mystrchr(const char *s, int c)
{
while ((unsigned char)*s != (unsigned char)c) {

Why the casts?

Because without them

char buff[LINELEN];
while(get_line_from_post(buff, sizeof(buff))) {
/* detect mangled indents from googoogroups */
char *mangledchars=buff;
while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
*mangledchars=' ';
}

may fail to fix any non-breaking spaces.

Eh?

If chars are signed, then you'll be comparing char -96 with int 160,
and not find any matches.

Yabbut, it's still wrong. The second argument to strchr()
is an int which is "converted to a char" (7.21.5.2p2), not to
an unsigned char.

I was addressing just mystrchr as posted, not strchr. However, as
you continue, the comparison is a most enlightening one.
So the given code behaves differently from
the Standard's description for any char value C and int value I
such that

(C == (char)I) != ((unsigned char)C == (unsigned char)I)

Can such a C,I pair exist? Yes, certainly, if I is less than
CHAR_MIN or greater than CHAR_MAX, because then (char)I is ill-
defined and can produce a peculiar result (6.3.1.3p2). That
peculiar result need not be closely related to the (well-defined)
result of converting to unsigned char, so the two comparisons
could come out differently -- in which case, the fix is to remove
the `unsigned' from both casts and allow the peculiar behavior
to occur. It's what the Standard requires.

I don't know quite how much 'the standard is occasionally a bit dumb'
was supposed to be in that, but with my 'the standard is occasionally
a bit dumb'-tinted spectacles, I certainly detected some.

So, got any portable C code for turning latin-1 NBSPs from googoogroups
into ' ' using strchr to find the errant chars?

Phil
 
P

Phil Carmody

Ian Collins said:
Ian Collins said:
On 04/14/10 09:19 AM, Phil Carmody wrote:
On 04/14/10 09:01 AM, candide wrote:
I request your opinion about the following attempt to implement the
standard function strstr. Here is the code :

No, it looks like you are writing strchr.


char *mystrchr(const char *s, int c)
{
while ((unsigned char)*s != (unsigned char)c) {

Why the casts?

Because without them

char buff[LINELEN];
while(get_line_from_post(buff, sizeof(buff))) {
/* detect mangled indents from googoogroups */
char *mangledchars=buff;
while ((mangledchars=mystrchr(mangledchars, 160)) != NULL)
*mangledchars=' ';
}

may fail to fix any non-breaking spaces.

Eh?

If chars are signed, then you'll be comparing char -96 with int 160,
and not find any matches.

So?

So it won't do what was requred of it.
a simple cast to char would be fine:

Now that's weasely wording. The cast you like is 'simple', but the
two casts from the OP are so horrible they must be avoided no matter
what?
while (*s != (char)c)

As the standard says (my emphasis):

"The strchr function locates the first occurrence of c *(converted to
a char)* in string s".

But, on the DS-2010, (char)'\xA0' is 127. That's going to replace all
the DEL characters instead!

Phil
 
P

Phil Carmody

Seebs said:
I don't think they are, actually. The "unsigned char" thing is needed
when using is*() functions, and for processing things like the result of
getchar(), etcetera, but in the case of strchr, it seems as though plain
char is right.

The values of the hex and octal escapes in character constants are
unsigned char too. (That are mapped onto ints as they become the
actual character constants.)

Phil
 
I

Ian Collins

So it won't do what was requred of it.


Now that's weasely wording. The cast you like is 'simple', but the
two casts from the OP are so horrible they must be avoided no matter
what?

Yes, one is bad enough thank you!
But, on the DS-2010, (char)'\xA0' is 127. That's going to replace all
the DEL characters instead!

So it doesn't have a conforming C implementation...
 
E

Eric Sosman

[...]
So, got any portable C code for turning latin-1 NBSPs from googoogroups
into ' ' using strchr to find the errant chars?

Assuming it's "portable" to know the numeric code of the
character in question,

#define NBSP ...whatever...
for (char *p; (p = strchr(text, NBSP)) != NULL; )
*p = ' ';

.... should do it. Knowing your system's own value for NBSP is
the non-portable part, and I can't think of a way to make it
100% portable. (Keep in mind that the character encoding on
the originating system may not be the same as on yours, and that
translations may have occurred en route. Indeed, if the origin
used the value 160 for NBSP and your system has CHAR_MAX<160,
translation *must* have occurred.)
 
E

Eric Sosman

[...]
So, got any portable C code for turning latin-1 NBSPs from googoogroups
into ' ' using strchr to find the errant chars?

Assuming it's "portable" to know the numeric code of the
character in question,

#define NBSP ...whatever...
for (char *p; (p = strchr(text, NBSP)) != NULL; )
*p = ' ';

Um, er, make that

#define NBSP ...
for (char *p = text; (p = strchr(p, NBSP)) != NULL; )
*p++ = ' ';

Sorry for the thinko.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,123
Latest member
Layne6498
Top