Implementing strstr

S

spinoza1111

In the replace() program of last month's flame festival, a little
program was trying to get out. Here it is: an implementation of strstr
including a call that returns the offset of the found substring. Two
hours including all comments and dedicatory ode, written for this
occasion.

#include <stdlib.h>
#include <stdio.h>

// ***************************************************************
// * *
// * strstr *
// * *
// * This function (strstr) finds a string, probably as fast as *
// * possible without extra memory usage over and above brute *
// * force. *
// * *
// * In searching a Nul terminated string for a substring, there *
// * are logically three possibilities in a left to right *
// * traversal of the master string that (1) looks for the *
// * first character of the target and then (2) matches all the *
// * remaining characters:
// * *
// * * (Erroneous): on the failure of a partial match, *
// * restart at the first nonmatching character. This is *
// * fast but wrong, since the matching string may *
// * overlap the partial match. *
// * *
// * * (Too slow): on the failure of a partial match, start*
// * one past the first character (of the partial match) *
// * *
// * * (Just right): while matching characters, note the *
// * leftmost character in the searched string, to the *
// * right of the first matched character, that matches *
// * both that character and, of course, the first *
// * character of the target. *
// * *
// * C H A N G E R E C O R D --------------------------------- *
// * DATE PROGRAMMER DESCRIPTION OF CHANGE *
// * -------- ---------- --------------------------------- *
// * 03 18 10 Nilges Version 1.0 *
// * *
// * ----------------------------------------------------------- *
// * *
// * To find a string, oh Muse! I sing, inside another String! *
// * Alternatives to me come in Three, ah, that's the thing: *
// * For the one thing for which the Wise must watch is mayhap, *
// * Partial occurences that most melancholy, overlap. *
// * The first is base, mechanical, low, and tragicomical: *
// * It's to restart from the previous beginning plus but One *
// * Oh what Mayhem to true Programming is thereby, done! *
// * But the job it will do, as did Hercules, *
// * His Labors for the Goddess cruel in Seneca's tragedies: *
// * Arduously and ignobly like unto the meanest Hind *
// * That knoweth not his Elbow from his Behind. *
// * The second is worse, a boner, a solecism, and a Seebach: *
// * The second restarts at the character that doth match! *
// * Oh muse! Such hellish Sights before me yawn: *
// * But be assur'd, 'tis darkest just before the Dawn. *
// * Shout for Victory, oh Thrace, and smite the Harp, and Grin: *
// * For lo, we start at the leftmost "handle" of the string *
// * When it occureth in *
// * The tragic partial match that hath failed us. *
// * If no such handle exists, then we can restart *
// * At the point of match failure: no, 'tis not a brain fart. *
// * Now we spy our magic bus: *
// * For this is the best Al Gore ithm *
// * That we can hope for in C, a language without Rhyme, or *
// * for that matter, Oh Muse! rhythm. *
// * *
// ***************************************************************

#define TRUTH -1
#define FALSITY 0
#define NULLITY 0

char * strstrWithIndex(char *strMaster,
char *strTarget,
int *ptrIndex)
{
char *ptrMaster = NULLITY;
char *ptrTarget = NULLITY;
char *ptrHandle = NULLITY;
int booFound = FALSITY;
if (!*strMaster || !*strTarget) return 0;
for (ptrMaster = strMaster; *ptrMaster;)
{
for (;
*ptrMaster && *ptrMaster != *strTarget;
ptrMaster++);
ptrTarget = strTarget;
*ptrIndex = ptrMaster - strMaster;
ptrHandle = 0;
for (;
*ptrTarget
?
(*ptrMaster
?
(*ptrMaster==*ptrTarget ? TRUTH : FALSITY)
:
FALSITY)
:
(booFound = TRUTH, FALSITY);
ptrMaster++, ptrTarget++)
{
if (ptrHandle = 0
&&
ptrMaster > strMaster
&&
*ptrMaster == *strTarget)
ptrHandle = ptrTarget;
}
if (booFound) return strMaster + *ptrIndex;
if (ptrHandle) ptrMaster = ptrHandle + 1;
}
*ptrIndex = 0;
return 0;
}

char * strstr(char *strMaster, char *strTarget)
{
int ptrIndex = 0;
return strstrWithIndex(strMaster, strTarget, &ptrIndex);
}

int main(void)
{
char *ptrIndex1 = NULLITY;
int intIndex1 = 0;
printf("strstr Simplified\n\n");
printf("Expect 0: %d\n", strstr("", ""));
printf("Expect 0: %d\n", strstr("0123456789", ""));
printf("Expect 0: %d\n", strstr("", "0"));
printf("Expect 0: %d\n", strstr("Here", "There"));
ptrIndex1 = strstrWithIndex("There", "here", &intIndex1);
printf("Expect 1: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex("They seek him here",
"here",
&intIndex1);
printf("Expect 14: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex("They seek him there",
"here",
&intIndex1);
printf("Expect 15: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex
("The clc regs seek him everywhere",
"here",
&intIndex1);
printf("Expect 28: %d\n", intIndex1);
printf("Expect 'h': %c\n", *ptrIndex1);
ptrIndex1 = strstrWithIndex
("Is he in Heaven? Or in Hell?",
"?",
&intIndex1);
printf("Expect 15: %d\n", intIndex1);
printf("Expect '?': %c\n", *ptrIndex1);
ptrIndex1 = strstrWithIndex
("That damn'd elusive Spinoza won't tell!",
"Spinoza",
&intIndex1);
printf("Expect 20: %d\n", intIndex1);
printf("Expect 'p': %c\n", *(ptrIndex1+1));
printf("Expect '0': %c\n", *strstr("0123456789", "0"));
printf("Expect '1': %c\n", *strstr("0123456789", "1"));
printf("Expect '0': %c\n", *strstr("0123456789", "0"));
printf("Expect '9': %c\n", *strstr("0123456789", "9"));
printf("Expect '5': %c\n", *strstr("0123456789", "345") + 2);
printf("Expect '8': %c\n", *strstr("0123456789", "89"));
ptrIndex1 = strstrWithIndex("0123456789A89AB",
"89AB",
&intIndex1);
printf("Expect 11: %d\n", intIndex1);
return 0;
}
 
D

Dr Malcolm McLean

// * This function (strstr) finds a string, probably as fast as  *
// * possible without extra memory usage over and above brute    *
// * force.                                                      *
There's a faster algorithm if a) the search string is long, and b) you
know the characteristics of the strings (natural English language,
protein sequence C code etc).

This is to scan the query string and take the two characters with the
least probability of appearing in your set. Eg for English language,
if the query is "quaker" you'd take 'q' and 'k'. Ypou then scan
looking only for that pair. If you find an exact match to the pair ypu
do a full scan.
 
S

spinoza1111

There's a faster algorithm if a) the search string is long, and b) you
know the characteristics of the strings (natural English language,
protein sequence C code etc).

This is to scan the query string and take the two characters with the
least probability of appearing in your set. Eg for English language,
if the query is "quaker" you'd take 'q' and 'k'. Ypou then scan
looking only for that pair. If you find an exact match to the pair ypu
do a full scan.

Interesting insight. I'd add a third overload allowing an optional
ranking of characters, with a #define symbol giving the ranking for
English text. I don't want the program to "learn" by remembering older
frequencies, since that is state, and state is best handled by object
oriented languages.

Why not scan for the least frequent character and then look around it
to see if it is in a substring?
 
S

spinoza1111

Problem with strstr is that there is no strnstr.
For example:

  if(!strcmp(readBuf_,"\r\n"))
  {
   rc = true;
  }
  else if((end = strstr(readBuf_,"\r\n\r\n")) && size_t(end-readBuf_)<=readBufSize_)
  {
     res_.append(readBuf_,end-readBuf_);
     rc = true;
  }

Disregard res_ as it is c++, but c++ doesn't have strnstr either!
How to use strstr safely when there is o strnstr and you work
with httpd 512 byte buffer eg?

I don't see the problem. "My" strstr simply assumes it will be passed
a Nul terminated string. If it falls off the edge of a cliff, isn't
this the user's problem?

I grant that there is a genuine problem in C tools that modify memory
such as printf. But in general, strstr and suchlike tools have no way
of protecting themselves that I'm aware of against bad data.
 
S

spinoza1111

That's not what strstr is supposed to do.
When the second argument to strstr points to an empty string,
then strstr is supposed to return the first argument,
which happens to be a non null pointer in the above cases.

Ill-defined. I don't choose to simulate mistakes, here. A null master
string is one in which a target string can never occur: a null target
could only occur in a null master string, but see rule one.

Perhaps I'm being narrow minded.

I suppose you could make a case, in a world biased towards left to
right, that all strings start with a magic invisible null string,
including a null master string.

So test for a null target, and if it is found, return the master
string's address and an index of 0 in strstrWithIndex().

Then, and only then, return 0 (not found) if the master string is
null.

You know what? Done.

#include <stdlib.h>
#include <stdio.h>

// ***************************************************************
// * *
// * strstr *
// * *
// * This function (strstr) finds a string, probably as fast as *
// * possible without extra memory usage over and above brute *
// * force. *
// * *
// * In searching a Nul terminated string for a substring, there *
// * are logically three possibilities in a left to right *
// * traversal of the master string that (1) looks for the *
// * first character of the target and then (2) matches all the *
// * remaining characters: *
// * *
// * * (Erroneous): on the failure of a partial match, *
// * restart at the first nonmatching character. This is *
// * fast but wrong, since the matching string may *
// * overlap the partial match. *
// * *
// * * (Too slow): on the failure of a partial match, start*
// * one past the first character (of the partial match) *
// * *
// * * (Just right): while matching characters, note the *
// * leftmost character in the searched string, to the *
// * right of the first matched character, that matches *
// * both that character and, of course, the first *
// * character of the target. *
// * *
// * C H A N G E R E C O R D --------------------------------- *
// * DATE PROGRAMMER DESCRIPTION OF CHANGE *
// * -------- ---------- --------------------------------- *
// * 03 18 10 Nilges Version 1.0 *
// * *
// * 03 19 10 Nilges Version 1.1 *
// * *
// * 1. Incorporates Pete's suggestion*
// * that a null target string is *
// * always found at the start of *
// * the master string. *
// * *
// * 2. Results display enhanced *
// * *
// * 3. Poem corrected (doth NOT *
// * match) *
// * *
// * ----------------------------------------------------------- *
// * *
// * To find a string, oh Muse! I sing, inside another String! *
// * Alternatives to me come in Three, ah, that's the thing: *
// * For the one thing for which the Wise must watch is mayhap, *
// * Partial occurences that most melancholy, overlap. *
// * The first is base, mechanical, low, and tragicomical: *
// * It's to restart from the previous beginning plus but One *
// * Oh what Mayhem to true Programming is thereby, done! *
// * But the job it will do, as did Hercules, *
// * His Labors for the Goddess cruel in Seneca's tragedies: *
// * Arduously and ignobly like unto the meanest Hind *
// * That knoweth not his Elbow from his Behind. *
// * The second is worse, a boner, a solecism, and a Seebach: *
// * The second restarts at the character that doth not match! *
// * Oh muse! Such hellish Sights before me yawn: *
// * But be assur'd, 'tis darkest just before the Dawn. *
// * Shout for Victory, oh Thrace, and smite the Harp, and Grin: *
// * For lo, we start at the leftmost "handle" of the string *
// * When it occureth in *
// * The tragic partial match that hath failed us. *
// * If no such handle exists, then we can restart *
// * At the point of match failure: no, 'tis not a brain fart. *
// * Now we spy our magic bus: *
// * For this is the best Al Gore ithm *
// * That we can hope for in C, a language without Rhyme, or *
// * for that matter, Oh Muse! rhythm. *
// * *
// ***************************************************************

#define TRUTH -1
#define FALSITY 0
#define NULLITY 0

char * strstrWithIndex(char *strMaster,
char *strTarget,
int *ptrIndex)
{
char *ptrMaster = NULLITY;
char *ptrTarget = NULLITY;
char *ptrHandle = NULLITY;
int booFound = FALSITY;
*ptrIndex = 0; // Rel. 1.1
if (!*strTarget) return strMaster; // Rel. 1.1
if (!*strMaster) return 0; // Rel. 1.1
for (ptrMaster = strMaster; *ptrMaster;)
{
for (;
*ptrMaster && *ptrMaster != *strTarget;
ptrMaster++);
ptrTarget = strTarget;
*ptrIndex = ptrMaster - strMaster;
ptrHandle = 0;
for (;
*ptrTarget
?
(*ptrMaster
?
(*ptrMaster==*ptrTarget ? TRUTH : FALSITY)
:
FALSITY)
:
(booFound = TRUTH, FALSITY);
ptrMaster++, ptrTarget++)
{
if (ptrHandle = 0
&&
ptrMaster > strMaster
&&
*ptrMaster == *strTarget)
ptrHandle = ptrTarget;
}
if (booFound) return strMaster + *ptrIndex;
if (ptrHandle) ptrMaster = ptrHandle + 1;
}
*ptrIndex = 0;
return 0;
}

char * strstr(char *strMaster, char *strTarget)
{
int ptrIndex = 0;
return strstrWithIndex(strMaster, strTarget, &ptrIndex);
}

int main(void)
{
char *ptrIndex1 = NULLITY;
int intIndex1 = 0;
printf("strstr\n\n");
printf("Expect 0: %x\n", *strstr("", ""));
printf("Expect '0': '%c'\n", *strstr("0123456789", ""));
printf("Expect 0: %d\n", strstr("", "0"));
printf("Expect 0: %d\n", strstr("Here", "There"));
ptrIndex1 = strstrWithIndex("There", "here", &intIndex1);
printf("Expect 1: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex("They seek him here",
"here",
&intIndex1);
printf("Expect 14: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex("They seek him there",
"here",
&intIndex1);
printf("Expect 15: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex
("The clc regs seek him everywhere",
"here",
&intIndex1);
printf("Expect 28: %d\n", intIndex1);
printf("Expect 'h': '%c'\n", *ptrIndex1);
ptrIndex1 = strstrWithIndex
("Is he in Heaven? Or in Hell?",
"?",
&intIndex1);
printf("Expect 15: %d\n", intIndex1);
printf("Expect '?': '%c'\n", *ptrIndex1);
ptrIndex1 = strstrWithIndex
("That damn'd elusive Spinoza won't tell!",
"Spinoza",
&intIndex1);
printf("Expect 20: %d\n", intIndex1);
printf("Expect 'p': '%c'\n", *(ptrIndex1+1));
printf("Expect '0': '%c'\n", *strstr("0123456789", "0"));
printf("Expect '1': '%c'\n", *strstr("0123456789", "1"));
printf("Expect '0': '%c'\n", *strstr("0123456789", "0"));
printf("Expect '9': '%c'\n", *strstr("0123456789", "9"));
printf("Expect '5': '%c'\n",
*strstr("0123456789", "345") + 2);
printf("Expect '8': '%c'\n", *strstr("0123456789", "89"));
ptrIndex1 = strstrWithIndex("0123456789A89AB",
"89AB",
&intIndex1);
printf("Expect 11: %d\n", intIndex1);
return 0;
}
 
S

spinoza1111

In the replace() program of last month's flame festival, a little
program was trying to get out. Here it is: an implementation of strstr
including a call that returns the offset of the found substring. Two
hours including all comments and dedicatory ode, written for this
occasion.

#include <stdlib.h>
#include <stdio.h>

// ***************************************************************
// *                                                             *
// * strstr                                                      *
// *                                                             *
// * This function (strstr) finds a string, probably as fast as  *
// * possible without extra memory usage over and above brute    *
// * force.                                                      *
// *                                                             *
// * In searching a Nul terminated string for a substring, there *
// * are logically three possibilities in a left to right        *
// * traversal of the master string that (1) looks for the       *
// * first character of the target and then (2) matches all the  *
// * remaining characters:
// *                                                             *
// *      *  (Erroneous): on the failure of a partial match,     *
// *         restart at the first nonmatching character. This is *
// *         fast but wrong, since the matching string may       *
// *         overlap the partial match.                          *
// *                                                             *
// *      *  (Too slow): on the failure of a partial match, start*
// *         one past the first character (of the partial match) *
// *                                                             *
// *      *  (Just right): while matching characters, note the   *
// *         leftmost character in the searched string, to the   *
// *         right of the first matched character, that matches  *
// *         both that character and, of course, the first       *
// *         character of the target.                            *
// *                                                             *

One reason for submitting this was to show how very wrong it is to
claim that "nothing useful is learnt in uni".

Note, please, the three-way reasoning in the above. It selects three
cases which seem almost tangential to the focus of the code, where in
business it's sooooo important to be focused (in a pseudo-scientific
spirit that deserves to be described as the quick, attentive motions
of the servant and not the attention of the scientist).

But in logic the alternatives "just happen" to exhaust all possible
cases! Which means that if we focus as the scientist and not the
servant on each case in turn, we've proved something, which rarely, in
my experience, happens in the profit and rent seeking world.

I first encountered this type of disection, trisection, dilemma and
trilemma in a book on formal automata, and would have preferred to
encounter it in a college classroom cleared of thugs.
 
S

spinoza1111

I don't see the problem. "My" strstr simply assumes it will be passed
a Nul terminated string. If it falls off the edge of a cliff, isn't
this the user's problem?

I grant that there is a genuine problem in C tools that modify memory
such as printf. But in general, strstr and suchlike tools have no way
of protecting themselves that I'm aware of against bad data.

I suppose that in rel. 1.3 I should check for NULL values of strMaster
and strTarget to forestall a Pedant Attack. But there is no SIMPLE
way, is there, of interrupting yourself as in a C Sharp try..catch
when you refer to memory that is not yours when a string without a Nul
terminator is passed to you.
 
E

Ersek, Laszlo

Problem with strstr is that there is no strnstr.
For example:

if(!strcmp(readBuf_,"\r\n"))
{
rc = true;
}
else if((end = strstr(readBuf_,"\r\n\r\n")) && size_t(end-readBuf_)<=readBufSize_)
{
res_.append(readBuf_,end-readBuf_);
rc = true;
}

Disregard res_ as it is c++, but c++ doesn't have strnstr either!
How to use strstr safely when there is o strnstr and you work
with httpd 512 byte buffer eg?

(Presumably I have no idea of the real topic of this thread, but I won't
let that disturb me.)

If you work with a "httpd buffer", you'll make that "char unsigned", not
plain char. Then you won't use strstr() but memmem() (not standard C) or
your own equivalent implementation.

#define _GNU_SOURCE
#include <string.h>

void *memmem(const void *haystack, size_t haystacklen,
const void *needle, size_t needlelen);

Finally, I hate to pull the EBCDIC card again (chill out, Kaz :) --
simply killfile me if you haven't done so yet), but "\r\n" is *not* what
you mean. RFC 2616 2.2 "Basic Rules" says,

CR = <US-ASCII CR, carriage return (13)>
LF = <US-ASCII LF, linefeed (10)>

So that would be, unless you say "this program works only with ASCII":

{
char unsigned buffer[BUFSIZE];
size_t loaded;

/* disservice to whomever signs your paycheck */
static const char unsigned crlf[] = { 0x0Du, 0x0Au };

char unsigned *found;

/* ... */

found = memmem(buffer, loaded, crlf, sizeof crlf);

/* ... */
}

lacos
 
E

Ersek, Laszlo

One question here. In c++ Im pretty sure that when you write 0xff
it defaults to unsigned type? Is that same in C or different.

C90 6.1.3.2 Integer constants

----v----
hexadecimal-constant
0x hexadecimal-digit
0X hexadecimal-digit
hexadecimal-constant hexadecimal-digit

[...]

The type of an integer constant is the first of the corresponding list
in which its value can be represented. Unsuffixed decimal: int, long
int, unsigned long int; unsuffixed octal or hexadecimal: int, unsigned
int, long int, unsigned long int; [...]
----^----

Same effect by C99 6.4.4.1 "Integer constants" p5.

C++98 and C++03 2.13.1 "Integer literals" p2:

----v----
[...] If it is octal or hexadecimal and has no suffix, it has the first
of these types in which its value can be represented: int, [...]
----^----

-> 0xff has type int "everywhere".

lacos
 
S

spinoza1111

You're arguing with someone who is not a first-year CS student and
claims to have needed TWO HOURS to implement strstr.

Peter, once more:

The time it takes is not even a crude measure of competence save in
the nastiest form of business office, such as yours, in which you post
the following Coding Horror:

while ((prefix[prefix_len - 1] == '/') && (prefix_len > 0))

in order to steal intellectual production and transform it into
intellectual property of Wind River Systems, by getting unsuspecting
people to find your bugs. The above bug demonstrates even more clearly
than your off by one strlen, your script kiddie %s filter that didn't
find %s and "the heap is a DOS term" that you, sir, are incompetent.

I'm not a subservient, autistic little code monkey as you increasingly
seem to be who makes messes in minutes and then expects others to fix
or forgive his errors, while assaulting hard working and best-selling
computer authors and their families, you dig me?

Furthermore, you missed the part where I said that the documentation
(which is ten times more literate and well formatted than the crap you
published as the pseudo root) and dedicatory ode (an original poem
recapitulating the essay) was part of the two hours.

Finally, you need to stop opening your punk byotch mouth about what
people do in first-year cs classes. I've TAUGHT computer science,
asshole. You were afraid to take ANY computer science classes
whatsoever, and you majored in the favorite college major of people
who can neither write nor do mathematics, that being computer science.

So, once more, asshole. How do you know the assignments in a first
year CS class?

You backstabbed Herb Schildt and paid your way into your current job,
and you're here for commercial gain as an employee of Wind River
Systems. I suggest you leave.
 
S

spinoza1111

Maybe one hour and fifty minutes was taken writing the poem in the
comments???

Probably forty minutes to code, forty minutes to document, forty
minutes to write the poem. But only in nasty little business offices
is time taken the sole measure of quality. In my book, someone who
takes extra time loves his work and is trying to do a better job.
 
S

spinoza1111

You're arguing with someone who is not a first-year CS student and
claims to have needed TWO HOURS to implement strstr.

Peter, once more:

The time it takes is not even a crude measure of competence save in
the nastiest form of business office, such as yours, in which you post
the following Coding Horror:

while ((prefix[prefix_len - 1] == '/') && (prefix_len > 0))

in order to steal intellectual production and transform it into
intellectual property of Wind River Systems, by getting unsuspecting
people to find your bugs. The above bug demonstrates even more clearly
than your off by one strlen, your script kiddie %s filter that didn't
find %s and "the heap is a DOS term" that you, sir, are incompetent.

I'm not a subservient, autistic little code monkey as you increasingly
seem to be who makes messes in minutes and then expects others to fix
or forgive his errors, while assaulting hard working and best-selling
computer authors and their families, you dig me?

Furthermore, you missed the part where I said that the documentation
(which is ten times more literate and well formatted than the crap you
published as the pseudo root) and dedicatory ode (an original poem
recapitulating the essay) was part of the two hours.

Finally, you need to stop opening your punk byotch mouth about what
people do in first-year cs classes. I've TAUGHT computer science,
asshole. You were afraid to take ANY computer science classes
whatsoever, and you majored in the favorite college major of people
who can neither write nor do mathematics, that being computer science.

Correction: "that being psychology", not "that being "computer
science".

Peter, I'd resolved to stay away from the pseudo root code but then
you came in here. I've complained to Apress about your behavior. The
gloves are off, and I will win.
 
S

spinoza1111

The first two releases had a bug: after a one character "handle" is
found we set the ptrIndex to the master string at one past the handle:
this is incorrect, for the start of the main loop will then search for
the first character of the NEXT occurence, missing the overlapping
occurence.

Here is the corrected code.

#include <stdlib.h>
#include <stdio.h>

// ***************************************************************
// * *
// * strstr *
// * *
// * This function (strstr) finds a string, probably as fast as *
// * possible without extra memory usage over and above brute *
// * force. *
// * *
// * In searching a Nul terminated string for a substring, there *
// * are logically three possibilities in a left to right *
// * traversal of the master string that (1) looks for the *
// * first character of the target and then (2) matches all the *
// * remaining characters:
// * *
// * * (Erroneous): on the failure of a partial match, *
// * restart at the first nonmatching character. This is *
// * fast but wrong, since the matching string may *
// * overlap the partial match. *
// * *
// * * (Too slow): on the failure of a partial match, start*
// * one past the first character (of the partial match) *
// * *
// * * (Just right): while matching characters, note the *
// * leftmost character in the searched string, to the *
// * right of the first matched character, that matches *
// * both that character and, of course, the first *
// * character of the target. *
// * *
// * C H A N G E R E C O R D --------------------------------- *
// * DATE PROGRAMMER DESCRIPTION OF CHANGE *
// * -------- ---------- --------------------------------- *
// * 03 18 10 Nilges Version 1.0 *
// * *
// * 03 19 10 Nilges Version 1.1 *
// * *
// * 1. Incorporates Pete's suggestion*
// * that a null target string is *
// * always found at the start of *
// * the master string. *
// * *
// * 2. Results display enhanced *
// * *
// * 03 19 10 Nilges Version 1.11: bug: ptrMaster was *
// * incorrectly set one past the *
// * ptrHandle *
// * *
// * ----------------------------------------------------------- *
// * *
// * To find a string, oh Muse! I sing, inside another String! *
// * Alternatives to me come in Three, ah, that's the thing: *
// * For the one thing for which the Wise must watch is mayhap, *
// * Partial occurences that most melancholy, overlap. *
// * The first is base, mechanical, low, and tragicomical: *
// * It's to restart from the previous beginning plus but One *
// * Oh what Mayhem to true Programming is thereby, done! *
// * But the job it will do, as did Hercules, *
// * His Labors for the Goddess cruel in Seneca's tragedies: *
// * Arduously and ignobly like unto the meanest Hind *
// * That knoweth not his Elbow from his Behind. *
// * The second is worse, a boner, a solecism, and a Seebach: *
// * The second restarts at the character that doth match! *
// * Oh muse! Such hellish Sights before me yawn: *
// * But be assur'd, 'tis darkest just before the Dawn. *
// * Shout for Victory, oh Thrace, and smite the Harp, and Grin: *
// * For lo, we start at the leftmost "handle" of the string *
// * When it occureth in *
// * The tragic partial match that hath failed us. *
// * If no such handle exists, then we can restart *
// * At the point of match failure: no, 'tis not a brain fart. *
// * Now we spy our magic bus: *
// * For this is the best Al Gore ithm *
// * That we can hope for in C, a language without Rhyme, or *
// * for that matter, Oh Muse! rhythm. *
// * *
// ***************************************************************

#define TRUTH -1
#define FALSITY 0
#define NULLITY 0

char * strstrWithIndex(char *strMaster,
char *strTarget,
int *ptrIndex)
{
char *ptrMaster = NULLITY;
char *ptrTarget = NULLITY;
char *ptrHandle = NULLITY;
int booFound = FALSITY;
*ptrIndex = 0; // Rel. 1.1
if (!*strTarget) return strMaster; // Rel. 1.1
if (!*strMaster) return 0; // Rel. 1.1
for (ptrMaster = strMaster; *ptrMaster;)
{
for (;
*ptrMaster && *ptrMaster != *strTarget;
ptrMaster++);
ptrTarget = strTarget;
*ptrIndex = ptrMaster - strMaster;
ptrHandle = 0;
for (;
*ptrTarget
?
(*ptrMaster
?
(*ptrMaster==*ptrTarget ? TRUTH : FALSITY)
:
FALSITY)
:
(booFound = TRUTH, FALSITY);
ptrMaster++, ptrTarget++)
{
if (ptrHandle = 0
&&
ptrMaster > strMaster
&&
*ptrMaster == *strTarget)
ptrHandle = ptrTarget;
}
if (booFound) return strMaster + *ptrIndex;
if (ptrHandle) ptrMaster = ptrHandle; // Rel. 1.11 bug fix
}
*ptrIndex = 0;
return 0;
}

char * strstr(char *strMaster, char *strTarget)
{
int ptrIndex = 0;
return strstrWithIndex(strMaster, strTarget, &ptrIndex);
}

int main(void)
{
char *ptrIndex1 = NULLITY;
int intIndex1 = 0;
printf("strstr\n\n");
printf("Expect 0: %x\n", *strstr("", ""));
printf("Expect '0': '%c'\n", *strstr("0123456789", ""));
printf("Expect 0: %d\n", strstr("", "0"));
printf("Expect 0: %d\n", strstr("Here", "There"));
ptrIndex1 = strstrWithIndex("There", "here", &intIndex1);
printf("Expect 1: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex("They seek him here",
"here",
&intIndex1);
printf("Expect 14: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex("They seek him there",
"here",
&intIndex1);
printf("Expect 15: %d\n", intIndex1);
ptrIndex1 = strstrWithIndex
("The clc regs seek him everywhere",
"here",
&intIndex1);
printf("Expect 28: %d\n", intIndex1);
printf("Expect 'h': '%c'\n", *ptrIndex1);
ptrIndex1 = strstrWithIndex
("Is he in Heaven? Or in Hell?",
"?",
&intIndex1);
printf("Expect 15: %d\n", intIndex1);
printf("Expect '?': '%c'\n", *ptrIndex1);
ptrIndex1 = strstrWithIndex
("That damn'd elusive Spinoza won't tell!",
"Spinoza",
&intIndex1);
printf("Expect 20: %d\n", intIndex1);
printf("Expect 'p': '%c'\n", *(ptrIndex1+1));
printf("Expect '0': '%c'\n", *strstr("0123456789", "0"));
printf("Expect '1': '%c'\n", *strstr("0123456789", "1"));
printf("Expect '0': '%c'\n", *strstr("0123456789", "0"));
printf("Expect '9': '%c'\n", *strstr("0123456789", "9"));
printf("Expect '5': '%c'\n",
*strstr("0123456789", "345") + 2);
printf("Expect '8': '%c'\n", *strstr("0123456789", "89"));
ptrIndex1 = strstrWithIndex("0123456789A89AB",
"89AB",
&intIndex1);
printf("Expect 11: %d\n", intIndex1);
return 0;
}
 
B

Ben Bacarisse

spinoza1111 said:
The first two releases had a bug: after a one character "handle" is
found we set the ptrIndex to the master string at one past the handle:
this is incorrect, for the start of the main loop will then search for
the first character of the NEXT occurence, missing the overlapping
occurence.

You have a == vs = type that makes this explanation unlikely. As far
as I can tell, ptrHandle is never anything other a null pointer and
none of the code involving it as any effect at all. That includes
this supposed fix.

<snip>
 
S

spinoza1111

You have a == vs = type that makes this explanation unlikely.  As far
as I can tell, ptrHandle is never anything other a null pointer and
none of the code involving it as any effect at all.  That includes
this supposed fix.

if (ptrHandle = 0 <- wrong, should be ==
&&
ptrMaster > strMaster
&&
*ptrMaster == *strTarget)
ptrHandle = ptrTarget;

Oops....very good call, thanks. I clearly need more test cases: the
overlaps are never found.

I will post a fixed version after I get back from class.
 
D

Dr Malcolm McLean

You're arguing with someone who is not a first-year CS student and
claims to have needed TWO HOURS to implement strstr.

Is there any genuine point in pointing out errors?
A quick and nasty strstr should only take a minute or so to write.

/*
quick and nasty strstr. Untested. Posted to comp.lang.c "as is"
without warrantry or guarantee of any kind, including implied
warrantry of merchanability or fitness for any particular purpose.
*/
char *mystrstr(char *str, char *sub)
{
size_t i; /* If you think this is ugly, join the campaign for 64-bit
ints */
while(*str)
{
for(i=0;str==sub && str;i++);
if(!sub)
return str;
str++;
}
return 0;
}

However it's a bit cryptic, particularly the for loop.
Whilst there's always the argument that as long as the interfaces are
nice, the code doesn't matter, I'd like to see a rather better job.
Then it will return a match to the first character if sub is the
empty string. I don't know offhand whether this is allowed. I'd have
to check the standard for a real implementation intended to be shipped
to someone else.
Two hours isn't unreasonable for a production-quality strstr.
 
S

spinoza1111

You're arguing with someone who is not a first-year CS student and
claims to have needed TWO HOURS to implement strstr.
Is there any genuine point in pointing out errors?

A quick and nasty strstr should only take a minute or so to write.

/*
   quick and nasty strstr. Untested. Posted to comp.lang.c "as is"
without warrantry or guarantee of any kind, including implied
warrantry of merchanability or fitness for any particular purpose.
*/
char *mystrstr(char *str, char *sub)
{
  size_t i; /* If you think this is ugly, join the campaign for 64-bit
ints */
  while(*str)
  {
    for(i=0;str==sub && str;i++);
    if(!sub)
       return str;
    str++;
  }
  return 0;

}

However it's a bit cryptic, particularly the for loop.
Whilst there's always the argument that as long as the interfaces are
nice, the code doesn't matter, I'd like to see a rather better job.
Then it  will return a match to the first character if sub is the
empty string. I don't know offhand whether this is allowed. I'd have
to check the standard for a real implementation intended to be shipped
to someone else.
Two hours isn't unreasonable for a production-quality strstr.


Well, I'm not done, as Ben has pointed out. There's a serious bug in
the code, and when I fixed it, further bugs arose.

However, I'd remark that one of the reasons I left the programming
field for far greener pastures after a successful thirty year career,
apart from not having to work with personalities like Seebach, was the
constant subordination of human needs (including the human need for
creative expression and helping one's fellow man) to "business" needs,
corporate speak for the profits of the few at the expense of the many.

Therefore, I'm not trying to either create a "production" quality
strstr nor even an instructional strstr. Nobody needs any lessons in
practical programming since it is rare today and today involves too
many lessons in subservience and backstabbing which I am unqualified
to teach.

Instead, as what Kant would call "a citizen of a better world" I am
writing code slowly to create solutions optimal in all directions
(efficiency and clarity) "as if" the world as presently constituted
needed any such thing. And while the efficiency is mathematical and
the same for everyone, it appears that my predicates of clarity
(greater use of literate English, somewhat longer identifiers, pre-
Szymonyi "Hungarian") are very different from those of others,
although the sustainability of the criminal argot of most programming
styles today is questionable.

Therefore, this, bugs and all, is a purely artistic gesture with
religious overtones, referring back to a time when art and religion
were fused. It's cave painting at the end of time.

PS. None of this is tongue in cheek and I shit you not.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top