Efficency and the standard library

Dann Corbit · Feb 11, 2010

"Dann Corbit" <[email protected]> ha scritto nel messaggio
[snip]

For a standard library, the *most* important thing is correctness. IOW,

Click to expand...

for a standard library function
the *most* important thing is the easy to use
and the error handling well think

Click to expand...

It doesn't matter how easy to it is to use if it gives you the wrong
answer.
It doesn't matter how well it handles errors if it gives you the wrong
answer.
It doesn't matter how fast it is if it gives you the wrong answer.
It doesn't matter how much memory it uses if it gives you the wrong
answer.

Correctness comes first. Then we can argue about everything else.

Besides which, we don't get to change ease of use. The interface and
documentation are already fully specified.

Michael Foukarakis · Feb 12, 2010

But wouldn't that be Extreme Programming?

(cfhttp://groups.google.com/group/comp.lang.c/msg/89ba3ef6554c9810?hl=en)

Is that a Bad Thing™?

spinoza1111 · Feb 12, 2010

This may have gotten buried given that it started in a Nilges thread,
but it's actually pretty interesting.

There was some discussion of algorithms to perform the task:
Inputs: target string, replacement string, original string
Output: a newly-allocated string containing the original
string with each copy of the target string replaced with the
replacement string.

Here's a hunk of mine:

for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
++count;
t += inlen;
}

Here's a hunk of another one:

These hunks are essentially performing the same part of the core loop;
find the next occurrence of the target string in the original string.

The second one was written by Edward Nilges ("spinoza1111"). He offers
in its defense the assertion that it's the "best" algorithm. (Nevermind
that it's got a bug; the bug could be fixed easily.)

Don't lie, and don't try to bring me down to your low level. When you
posted this code, the bug had been fixed.

You are incredibly dishonest. You were dishonest when you advanced
your career by trying to destroy Herb Schildt, and here, on Feb 11,
you posted an OLD version of the code that contained the bug that that
the search is resumed after the end of the partial match, missing
overlapping strings.

I found, reported and fixed this bug on Feb 10 and on that date you
acknowledged that you had not found it. Also on that date, I made a
code improvement that works based on insight provided by the bug:
taking the first occurrence of the handle after the start of a partial
match as the restart point.

You have dishonestly presented the above code as my latest version in
order to dishonestly recover your reputation for competence, but you
are both incompetent and dishonest.

My thinking:

The first is better, not because it's less buggy (that could be fixed), but

You continually post solutions with bugs: you do not fix your crap:
and yet you have the nerve to disrespect others based on their bugs.

because it's simpler to understand. It may, or may not, be less efficient.
However, efficiency will vary widely with input characteristics; for some
cases, the arguable inefficiencies may be completely irrelevant, while in
others, they'd be significant.

But you should always write the simpler one first, and wait until you've
profiled the code and verified that there's a bottleneck, and you know where
it is, before trying to optimize. You may, or may not, be able to outperform
a given library's implementation of strstr(). If you have unusual inputs,
you have a pretty good shot at it... But it may not be worth it. The
complicated example has had at least one bug identified, and there may be
others. It's extremely hard to read, partially because of the poor
naming, and partially because of the counter-idiomatic usage. But mostly,
it's a lot more work to write and maintain, and for no known benefit.

It's good to think a little about efficiency, but write for clarity and
correctness first, so you have a known-working program to check your
results if you later need to modify it for speed.

But: your code, as you concede, isn't working!

What a jerk!

spinoza1111 · Feb 12, 2010

Seebs said:
Seebs said:

This may have gotten buried given that it started in a Nilges thread,
but it's actually pretty interesting.

Click to expand...

There was some discussion of algorithms to perform the task:
Inputs: target string, replacement string, original string
Output: a newly-allocated string containing the original
string with each copy of the target string replaced with the
replacement string.

Click to expand...

Here's a hunk of mine:

Click to expand...

for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
++count;
t += inlen;
}

Click to expand...

Here's a hunk of another one:

Click to expand...

These hunks are essentially performing the same part of the core loop;
find the next occurrence of the target string in the original string.

Click to expand...

The second one was written by Edward Nilges ("spinoza1111"). He offers
in its defense the assertion that it's the "best" algorithm. (Nevermind
that it's got a bug; the bug could be fixed easily.)

Click to expand...

My thinking:

Click to expand...

The first is better, not because it's less buggy (that could be fixed), but
because it's simpler to understand. It may, or may not, be less efficient.
However, efficiency will vary widely with input characteristics; for some
cases, the arguable inefficiencies may be completely irrelevant, while in
others, they'd be significant.

Click to expand...

I don't think you can dismiss the bugs so easily. I would argue that
it is not a coincidence that the more complex code has more bugs. I
can't write code that looks likespinoza1111'scode and get it right.
I *have* to write simpler code, broken into simple functions, or I have
no chance of getting it to be correct.

For example, the latest improvement introduced another bug[1] and I
don't think that is simple carelessness. Without a re-write I suspect
almost any change is as likely to introduce a bug as fix one (I tried
to see where the error was, but I gave up). If I was paid to fix it,
I'd simply start again and it would end a up as I have already posted,
though to mirror the behaviour I'd now have to add some argument
checking.

<snip>

[1] Apparently when the target string is not present in the source
string: e.g. replace("a", "x", "b").

I have returned to this little project following my sabbatical to let
other people post their views.

Good call.

Replace "x" by "b" in "a"
Expect "a":
"ab"
Replacements expected: 0: replacements: 1
Assertion failed

I am fixing the problem, now, because unlike Seebach, I fix my bugs.
It's 1:09 AM 13 Feb.

Gung hey fat choi.

spinoza1111 · Feb 12, 2010

Seebs said:
Seebs said:

This may have gotten buried given that it started in a Nilges thread,
but it's actually pretty interesting.

Click to expand...

There was some discussion of algorithms to perform the task:
Inputs: target string, replacement string, original string
Output: a newly-allocated string containing the original
string with each copy of the target string replaced with the
replacement string.

Click to expand...

Here's a hunk of mine:

Click to expand...

for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
++count;
t += inlen;
}

Click to expand...

Here's a hunk of another one:

Click to expand...

These hunks are essentially performing the same part of the core loop;
find the next occurrence of the target string in the original string.

Click to expand...

The second one was written by Edward Nilges ("spinoza1111"). He offers
in its defense the assertion that it's the "best" algorithm. (Nevermind
that it's got a bug; the bug could be fixed easily.)

Click to expand...

My thinking:

Click to expand...

The first is better, not because it's less buggy (that could be fixed), but
because it's simpler to understand. It may, or may not, be less efficient.
However, efficiency will vary widely with input characteristics; for some
cases, the arguable inefficiencies may be completely irrelevant, while in
others, they'd be significant.

Click to expand...

I don't think you can dismiss the bugs so easily. I would argue that
it is not a coincidence that the more complex code has more bugs. I
can't write code that looks likespinoza1111'scode and get it right.
I *have* to write simpler code, broken into simple functions, or I have
no chance of getting it to be correct.

For example, the latest improvement introduced another bug[1] and I
don't think that is simple carelessness. Without a re-write I suspect
almost any change is as likely to introduce a bug as fix one (I tried
to see where the error was, but I gave up). If I was paid to fix it,
I'd simply start again and it would end a up as I have already posted,
though to mirror the behaviour I'd now have to add some argument
checking.

<snip>

[1] Apparently when the target string is not present in the source
string: e.g. replace("a", "x", "b").

It took me about an hour to fix this because of some blind alleys. See
the Change Record for 13 Feb for the explanation of the bug, a failure
to anticipate that there could be a situation when the main scan
pointer (ptrIndex1) and the main target pointer (ptrIndex2) both point
to Nul but the strings do not match.

In addition, I discovered that there are cases when I initialize the
secondary scan pointer ptrIndex3 to one past the Nul that terminates
the master string, and this has also been fixed.

The purpose of my style and approach is to reduce the bug rate
exponentially going forward. I don't post code while saying "it has a
bug, but it's easy to fix". This was Peter Seebach's initial mistake:
he posted a crappy solution to replacing %s that used a character scan
that fails when a percent occurs in the input, admitting the bug
without bothering to fix it.

This is why I have retained so many tests, and why I wrote a TESTER
macro, and why I formatted the code more carefully than most little
code monkeys care to format their code.

This is the arrogant programmer who expects others to clean up his
messes for him (and lies about people to get his way, I might add).

Having said all this, I am not completely happy with my solution. This
is because I have no assurance that there might be other cases where
it fails, and this in turn is because it's just code, not theory. In
terms of CS theory, it's doing two things at once: finding strings,
and replacing them. As I have said, it arbitrarily goes left to right
and doesn't handle overlapping hits in the way some applications might
require replace("banana", "ana", "xxx") to yield bxxxxx.

[In rereading Cormen's Algorithms the other night on string matching,
which treats the problem separate from match/replace for solid
theoretical reasons, I noticed that one of his students brought this
to his attention.]

Programmers who backstab and lie, who mock others who fix their bugs
for having the bugs, while grinning about their own, and who present
code with bugs that they won't fix, create buggy solutions out of
dishonesty. They helped to cause the credit crisis, because in many
cases, securitized mortgages and their derivative contracts were never
examined for loops such that one derivative referred to itself through
a chain. It increasingly appears that bugs in software lie at the
bottom of the Toyota recall.

Dijkstra is right. Programming IS too hard for Republicans and other
liars (Peter Seebach is on record as a Bush supporter, at least in
2000).

Here is the latest version, followed by its console output.

// ***************************************************************
// * *
// * replace() demo *
// * *
// * Demonstrates how to replace non-NUL-defined strings in C *
// * using a simple function, and bypassing string.h. *
// * *
// * C H A N G E R E C O R D --------------------------------- *
// * DATE PROGRAMMER DESCRIPTION OF CHANGE *
// * -------- ---------- --------------------------------- *
// * 02 07 10 Nilges Version 1.0 *
// * *
// * 02 07 10 Nilges Bug: partial matches not handled *
// * correctly: need to iterate search *
// * for match. *
// * *
// * 02 07 10 Nilges 1. Santosh suggested tests *
// * 2. Heathfield put the boot in re *
// * including malloc *
// * *
// * 02 07 10 Nilges 1. Remove string.h use and code *
// * strlen by hand *
// * 2. Add comment block and comments*
// * inline. *
// * 3. free() storage *
// * 4. Use macro for testing *
// * 5. Bug: calculation of *
// * lngNewLength used incorrect *
// * index (intIndex3 instead of 2)*
// * which caused a memory leak. *
// * *
// * 02 07 10 Nilges 1. Bug: Ike Naar test failed. *
// * At end of scan loop, main *
// * string pointer was not *
// * correctly updated from index3 *
// * *
// * 02 07 10 Nilges Added new Santosh test *
// * *
// * 02 07 10 Nilges Added new Ike Naar test *
// * *
// * 02 08 10 Nilges 1. Added some new comments *
// * 2. Make "check for a complete *
// * match "structured" by means *
// * of a tested assignment *
// * 3. Get rid of "replace at end" *
// * evilness: the only time this *
// * flag is meaningful is in the *
// * LAST segment. *
// * 4. Return replace count *
// * 5. TESTER macro assertion added *
// * *
// * 02 10 10 Nilges 1. Bug fix: in a partial match, *
// * the main scan index is set to *
// * one past the end, which misses*
// * full matches that start *
// * between the first character of*
// * the partial match and its end.*
// * *
// * No longer updating the main *
// * scan index (ptrIndex1) to the *
// * point of partial match *
// * failure: setting it one past *
// * the first character of the *
// * partial match. *
// * *
// * 2. Line up expected & actual *
// * results per excellent *
// * suggestion (who made this?) *
// * *
// * 3. After a partial match, update *
// * the main handle search index *
// * (ptrIndex1) to the first *
// * occurrence of the handle *
// * character after the start of *
// * the partial match, or to the *
// * index of the unmatched char. *
// * *
// * 021310 Nilges Bug: failed to handle a one-char *
// * replace (of a by x in b) since *
// * we set ptrIndex2 to point to NUL *
// * which made it appear that there *
// * was a match. When the main index *
// * to the master string (ptrIndex1) *
// * goes off the end of a cliff, we *
// * needed to break out of the search *
// * loop. *
// * *
// * This also entailed setting the *
// * target index ptrIndex2 to 0 at the*
// * beginning of each search so that *
// * it is not erroneously used to *
// * indicate that there's an insert *
// * needed at the end. *
// * *
// * I have also added additional tests*
// * to verify that the code works *
// * when the string and target end *
// * at the same time. *
// * ----------------------------------------------------------- *
// * *
// * "In the near future we shall have to live with the *
// * superstition that programming is 'so easy that even a *
// * Republican can do it!'" *
// * *
// * - E. W. Dijkstra *
// * *
// * *
// ***************************************************************
#include <stdio.h>
#include <stdlib.h>
// ***** Segmentation *****
struct TYPsegmentstruct
{ char * strSegment;
long lngSegmentLength;
struct TYPsegmentstruct * ptrNext; };
// ---------------------------------------------------------------
// Calculate string length
//
//
long strLength(char *strInstring)
{
char *ptrInstring;
for (ptrInstring = strInstring; *ptrInstring; ptrInstring++);
return ptrInstring - strInstring;
}

// ---------------------------------------------------------------
// Replace target by replacement string in master string
//
//
// Caution: the string returned by this function should be freed.
//
//
char * replace(char * strMaster,
char * strTarget,
char * strReplacement,
long * ptrReplacements)
{
char * ptrIndex0;
char * ptrIndex1;
char * ptrIndex2;
char * ptrIndex3;
char * ptrIndex4;
char * strNew;
char * strNewStart;
long lngNewLength;
long lngCount;
long lngReplacementLength;
struct TYPsegmentstruct * ptrSegmentStructStarts;
struct TYPsegmentstruct * ptrSegmentStruct;
struct TYPsegmentstruct * ptrSegmentStructPrev;
lngReplacementLength = strLength(strReplacement);
if (!*strTarget)
{
printf("Error in calling replace(): target can't be null");
abort();
}
ptrIndex1 = strMaster;
ptrSegmentStructPrev = 0;
lngNewLength = 0;
*ptrReplacements = 0;
while(*ptrIndex1)
{
ptrIndex0 = ptrIndex1;
ptrIndex2 = 0;
while (-1)
{
// --- Check for (one character) handle
for(;
*ptrIndex1 && *ptrIndex1 != *strTarget;
ptrIndex1++);
if (!*ptrIndex1) break;
// --- Check for complete match while remembering the
// --- last position of the handle
ptrIndex4 = 0;
for(ptrIndex2 = strTarget + 1,
ptrIndex3 = ptrIndex1 + 1;
*ptrIndex3
&&
*ptrIndex2
&&
*ptrIndex3 == *ptrIndex2;
ptrIndex3++, ptrIndex2++)
{
if (*ptrIndex3 == *strTarget
&&
ptrIndex4 == 0) ptrIndex4 = ptrIndex3;
}
// End test: check complete match, update main ptr past
// partial match while checking for end of loop
if ((!*ptrIndex2 ? ((*ptrReplacements)++, -1) : 0)
||
(!*ptrIndex3 ? (ptrIndex1 = ptrIndex3, -1) : 0))
break;
// Update the main search pointer
ptrIndex1 = (ptrIndex4 == 0 ? ptrIndex3 : ptrIndex4);
}
// --- Create new segment
if (!(ptrSegmentStruct =
malloc(sizeof(struct TYPsegmentstruct))))
abort();
ptrSegmentStruct->strSegment = ptrIndex0;
ptrSegmentStruct->lngSegmentLength =
ptrIndex1 - ptrIndex0;
ptrSegmentStruct->ptrNext = 0;
if (ptrSegmentStructPrev != 0)
ptrSegmentStructPrev->ptrNext = ptrSegmentStruct;
else
ptrSegmentStructStarts = ptrSegmentStruct;
ptrSegmentStructPrev = ptrSegmentStruct;
// --- Update mallocation length
lngNewLength += ptrSegmentStruct->lngSegmentLength +
(ptrIndex2 && !*ptrIndex2
?
lngReplacementLength
:
0);
// --- Get past end of target string & iterate
if (*ptrIndex1) ptrIndex1 = ptrIndex3;
}
// --- Allocate just enough storage for the new string
if (!(strNewStart = malloc(lngNewLength + 1))) abort();
// --- Build the new string whilst freeing the list
strNew = strNewStart;
ptrSegmentStruct = ptrSegmentStructStarts;
while (ptrSegmentStruct)
{
for (ptrIndex1 = ptrSegmentStruct->strSegment,
lngCount = 0;
lngCount < ptrSegmentStruct->lngSegmentLength;
ptrIndex1++, lngCount++, strNew++)
*strNew = *ptrIndex1;
if (ptrSegmentStruct->ptrNext
||
ptrIndex2 != 0 && !*ptrIndex2)
for (ptrIndex1 = strReplacement;
*ptrIndex1;
ptrIndex1++, ++strNew)
*strNew = *ptrIndex1;
ptrSegmentStructPrev = ptrSegmentStruct;
ptrSegmentStruct = ptrSegmentStruct->ptrNext;
free(ptrSegmentStructPrev);
}
*strNew = '\0';
return strNewStart;
}

// ---------------------------------------------------------------
// Statement-format test macro
//
//
#define TESTER(resultPtr, master, target, replacement, expected,
expectedReplacements, replacements) \
{ \
printf("Replace \"%s\" by \"%s\" in \"%s\"\n", \
(target), (replacement), (master)); \
printf("Expect \"%s\":\n \"%s\"\n", \
(expected), \
resultPtr = replace((master), \
(target), \
(replacement), \
&(replacements))); \
printf("Replacements expected: %d: replacements: %d\n", \
(expectedReplacements), \
(replacements)); \
if (!(strLength(resultPtr) \
== \
strLength(master) \
+ \
(strLength(replacement)-strLength(target)) \
* \
replacements)) \
printf("Assertion failed\n"); \
printf("\n\n"); \
free(resultPtr); \
}

// ---------------------------------------------------------------
// Main procedure
//
//
int main()
{
char *ptrResult;
long lngReplacements;
printf("\nReplace\n\n\n");
TESTER(ptrResult,
"1111123bbb1111123bbb11123bb11111231111112111111123",
"111123",
"ono",
"1onobbb1onobbb11123bb1ono1111112111ono",
4,
lngReplacements)
TESTER(ptrResult,
"bbb1111123bbbbb",
"111123",
"ono",
"bbb1onobbbbb",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid error",
"stupid error",
"miracle",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid error",
"stupid",
"miracle",
"a miracle error",
1,
lngReplacements)
TESTER(ptrResult,
"the stupid error",
"the stupid error",
"a miracle",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"the miracle",
"the",
"a",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"a miraclsnirpKamunkle",
"snirpKamunkle",
"e",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"a miraclesnirpKamunkle",
"a miracle",
"",
"snirpKamunkle",
1,
lngReplacements)
TESTER(ptrResult,
" a miraclesnirpKamunkle",
"a miracle",
"",
" snirpKamunkle",
1,
lngReplacements)
TESTER(ptrResult,
" a miraclesnirpKamunklea miraclea miracle",
"a miracle",
"",
" snirpKamunkle",
3,
lngReplacements)
TESTER(ptrResult,
"a miracle a miraclesnirpKamunkle a Miraclea miraclea
miracle",
"a miracle",
"",
" snirpKamunkle a Miracle",
4,
lngReplacements)
TESTER(ptrResult,
"a stupid errord",
"stupid error",
"miracle",
"a miracled",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid errod",
"stupid error",
"miracle",
"a stupid errod",
0,
lngReplacements)
TESTER(ptrResult,
"a sstupid error",
"stupid error",
"miracle",
"a smiracle",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid errorstupid error",
"stupid error",
"miracle",
"a miraclemiracle",
2,
lngReplacements)
TESTER(ptrResult,
"a stupid error stupiderror",
"stupid error",
"miracle",
"a miracle stupiderror",
1,
lngReplacements)
TESTER(ptrResult,
"bbbbbbbbbb",
"b",
"a",
"aaaaaaaaaa",
10,
lngReplacements)
TESTER(ptrResult,
"In the halls of R'yleh great %s lies dreaming",
"%s",
"Cthulu",
"In the halls of R'yleh great Cthulu lies dreaming",
1,
lngReplacements)
TESTER(ptrResult,
"%s%s%s%s%s%s",
"%s",
"Cthulu",
"CthuluCthuluCthuluCthuluCthuluCthulu",
6,
lngReplacements)
TESTER(ptrResult,
"banana",
"ana",
"oat",
"boatna",
1,
lngReplacements)
TESTER(ptrResult,
" a stupid errorstupid errorHeystupid errors",
"stupid error",
"+",
" a ++Hey+s",
3,
lngReplacements)
TESTER(ptrResult,
"foo barfoo barf",
"foo bar",
"bas",
"basbasf",
2,
lngReplacements)
TESTER(ptrResult,
"abab",
"ba",
"ba",
"abab",
1,
lngReplacements)
TESTER(ptrResult,
"abab",
"bab",
"boop",
"aboop",
1,
lngReplacements)
TESTER(ptrResult,
"banana",
"ana",
"ono",
"bonona",
1,
lngReplacements)
TESTER(ptrResult,
"a",
"x",
"b",
"a",
0,
lngReplacements)
TESTER(ptrResult,
"x",
"x",
"b",
"b",
1,
lngReplacements)
TESTER(ptrResult,
"egregious",
"egregious",
"egregious",
"egregious",
1,
lngReplacements)
TESTER(ptrResult,
"egregious",
"egregious",
"x",
"x",
1,
lngReplacements)
printf("\n\nTesting complete: check output carefully: \"Assertion
failed\" should not occur!\n\n");
return 0;
}

Replace

Replace "111123" by "ono" in
"1111123bbb1111123bbb11123bb11111231111112111111123"
Expect "1onobbb1onobbb11123bb1ono1111112111ono":
"1onobbb1onobbb11123bb1ono1111112111ono"
Replacements expected: 4: replacements: 4

Replace "111123" by "ono" in "bbb1111123bbbbb"
Expect "bbb1onobbbbb":
"bbb1onobbbbb"
Replacements expected: 1: replacements: 1

Replace "stupid error" by "miracle" in "a stupid error"
Expect "a miracle":
"a miracle"
Replacements expected: 1: replacements: 1

Replace "stupid" by "miracle" in "a stupid error"
Expect "a miracle error":
"a miracle error"
Replacements expected: 1: replacements: 1

Replace "the stupid error" by "a miracle" in "the stupid error"
Expect "a miracle":
"a miracle"
Replacements expected: 1: replacements: 1

Replace "the" by "a" in "the miracle"
Expect "a miracle":
"a miracle"
Replacements expected: 1: replacements: 1

Replace "snirpKamunkle" by "e" in "a miraclsnirpKamunkle"
Expect "a miracle":
"a miracle"
Replacements expected: 1: replacements: 1

Replace "a miracle" by "" in "a miraclesnirpKamunkle"
Expect "snirpKamunkle":
"snirpKamunkle"
Replacements expected: 1: replacements: 1

Replace "a miracle" by "" in " a miraclesnirpKamunkle"
Expect " snirpKamunkle":
" snirpKamunkle"
Replacements expected: 1: replacements: 1

Replace "a miracle" by "" in " a miraclesnirpKamunklea miraclea
miracle"
Expect " snirpKamunkle":
" snirpKamunkle"
Replacements expected: 3: replacements: 3

Replace "a miracle" by "" in "a miracle a miraclesnirpKamunkle a
Miraclea miraclea miracle"
Expect " snirpKamunkle a Miracle":
" snirpKamunkle a Miracle"
Replacements expected: 4: replacements: 4

Replace "stupid error" by "miracle" in "a stupid errord"
Expect "a miracled":
"a miracled"
Replacements expected: 1: replacements: 1

Replace "stupid error" by "miracle" in "a stupid errod"
Expect "a stupid errod":
"a stupid errod"
Replacements expected: 0: replacements: 0

Replace "stupid error" by "miracle" in "a sstupid error"
Expect "a smiracle":
"a smiracle"
Replacements expected: 1: replacements: 1

Replace "stupid error" by "miracle" in "a stupid errorstupid error"
Expect "a miraclemiracle":
"a miraclemiracle"
Replacements expected: 2: replacements: 2

Replace "stupid error" by "miracle" in "a stupid error stupiderror"
Expect "a miracle stupiderror":
"a miracle stupiderror"
Replacements expected: 1: replacements: 1

Replace "b" by "a" in "bbbbbbbbbb"
Expect "aaaaaaaaaa":
"aaaaaaaaaa"
Replacements expected: 10: replacements: 10

Replace "%s" by "Cthulu" in "In the halls of R'yleh great %s lies
dreaming"
Expect "In the halls of R'yleh great Cthulu lies dreaming":
"In the halls of R'yleh great Cthulu lies dreaming"
Replacements expected: 1: replacements: 1

Replace "%s" by "Cthulu" in "%s%s%s%s%s%s"
Expect "CthuluCthuluCthuluCthuluCthuluCthulu":
"CthuluCthuluCthuluCthuluCthuluCthulu"
Replacements expected: 6: replacements: 6

Replace "ana" by "oat" in "banana"
Expect "boatna":
"boatna"
Replacements expected: 1: replacements: 1

Replace "stupid error" by "+" in " a stupid errorstupid errorHeystupid
errors"
Expect " a ++Hey+s":
" a ++Hey+s"
Replacements expected: 3: replacements: 3

Replace "foo bar" by "bas" in "foo barfoo barf"
Expect "basbasf":
"basbasf"
Replacements expected: 2: replacements: 2

Replace "ba" by "ba" in "abab"
Expect "abab":
"abab"
Replacements expected: 1: replacements: 1

Replace "bab" by "boop" in "abab"
Expect "aboop":
"aboop"
Replacements expected: 1: replacements: 1

Replace "ana" by "ono" in "banana"
Expect "bonona":
"bonona"
Replacements expected: 1: replacements: 1

Replace "x" by "b" in "a"
Expect "a":
"a"
Replacements expected: 0: replacements: 0

Replace "x" by "b" in "x"
Expect "b":
"b"
Replacements expected: 1: replacements: 1

Replace "egregious" by "egregious" in "egregious"
Expect "egregious":
"egregious"
Replacements expected: 1: replacements: 1

Replace "egregious" by "x" in "egregious"
Expect "x":
"x"
Replacements expected: 1: replacements: 1

Testing complete: check output carefully: "Assertion failed" should
not occur!

spinoza1111 · Feb 12, 2010

I don't think you can dismiss the bugs so easily. I would argue that
it is not a coincidence that the more complex code has more bugs. I
can't write code that looks likespinoza1111'scode and get it right.
I *have* to write simpler code, broken into simple functions, or I have
no chance of getting it to be correct.

Click to expand...

You may be right on this. Certainly, this kind of thing is one of the
major reasons that I sometimes rewrite things a few times until I am
sure I can understand the logic just by looking at it. If I have to
think it through, usually that means something is wrong, or the problem
is extremely hard.

For example, the latest improvement introduced another bug[1] and I
don't think that is simple carelessness. Without a re-write I suspect
almost any change is as likely to introduce a bug as fix one (I tried
to see where the error was, but I gave up). If I was paid to fix it,
I'd simply start again and it would end a up as I have already posted,
though to mirror the behaviour I'd now have to add some argument
checking.

Click to expand...

I think it would be interesting to start from his code and see whether it
can be fixed by cleanups and refactoring. I bet it could, although it'd
certainly take longer than just writing one from scratch.

Well, unlike your original bug of scanning for % in two places as
ersatz for finding the string %s, it would be a simple matter for you
to use a modern editor to get rid of my Hungarian. And you cannot seem
to solve the problem from scratch without creating bugs that you won't
fix.

spinoza1111 · Feb 12, 2010

"I now suggest that we confine ourselves to the design and
implementation of intellectually manageable programs." Dijkstra

Amen, brother.

Dijkstra would say that we needed to disambiguate the two goals, and
that the unexpected complexity of "find and replace" results from the
fact that there are two things going on.

Perhaps we really need a string finder and a replacer which would
encapsulate left to right, right to left, and what to do on overlap.

However, the traditional way to do this can be slow, since it involves
creating a "table" of occurrences.

But two independent processes, a source and a sink, would handle the
problems nicely.

For example, the latest improvement introduced another bug[1] and I
don't think that is simple carelessness. Without a re-write I suspect
almost any change is as likely to introduce a bug as fix one (I tried
to see where the error was, but I gave up). If I was paid to fix it,
I'd simply start again and it would end a up as I have already posted,
though to mirror the behaviour I'd now have to add some argument
checking.

<snip>

Click to expand...

[1] Apparently when the target string is not present in the source
string: e.g. replace("a", "x", "b").

Click to expand...

spinoza1111 · Feb 12, 2010

"Dann Corbit" <[email protected]> ha scritto nel messaggio
[snip]

For a standard library, the *most* important thing is correctness. IOW,

Click to expand...

Click to expand...

for a standard library function
the *most* important thing is the easy to use
and the error handling well think

Click to expand...

It doesn't matter how easy to it is to use if it gives you the wrong
answer.
It doesn't matter how well it handles errors if it gives you the wrong
answer.
It doesn't matter how fast it is if it gives you the wrong answer.
It doesn't matter how much memory it uses if it gives you the wrong
answer.

Correctness comes first. Then we can argue about everything else.

True, which is why I've responded to each bug report. But note that in
many industrial programming environments, where programming is a pure
cost center and creativity isn't wanted, responding to bugs can be
renarrated as having a lot of bugs.

This is why Peter admits bugs...but doesn't fix them.

Anand Hariharan · Feb 12, 2010

"Dann Corbit" <[email protected]> ha scritto nel messaggio
[snip]

For a standard library, the *most* important thing is correctness. IOW,

Click to expand...

Click to expand...

for a standard library function
the *most* important thing is the easy to use
and the error handling well think

Click to expand...

It doesn't matter how easy to it is to use if it gives you the wrong
answer.
It doesn't matter how well it handles errors if it gives you the wrong
answer.
It doesn't matter how fast it is if it gives you the wrong answer.
It doesn't matter how much memory it uses if it gives you the wrong
answer.

Correctness comes first. Then we can argue about everything else.

Correctness is to do with the implementation. As a function in the
standard library, it is important to get its interface 'right' (which
relates to ease of use and error handling).

spinoza1111 · Feb 12, 2010

Is that a Bad Thing™?

Click to expand...

Towards the end of my own programming career (I left the field to
become a teacher), I had very good experience with Extreme Programming
in Pairs. Surprisingly, the embodied humanity of the two people
working together, along with a little professionalism, helps even
enemies work on code together. I could probably even work with
Seebach, after I got finished slapping him around. Just
kidding...about the bitch slap.

spinoza1111 · Feb 12, 2010

Seebs said:
Seebs said:

This may have gotten buried given that it started in a Nilges thread,
but it's actually pretty interesting.

Click to expand...

There was some discussion of algorithms to perform the task:
Inputs: target string, replacement string, original string
Output: a newly-allocated string containing the original
string with each copy of the target string replaced with the
replacement string.

Click to expand...

Here's a hunk of mine:

Click to expand...

for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
++count;
t += inlen;
}

Click to expand...

Here's a hunk of another one:

Click to expand...

These hunks are essentially performing the same part of the core loop;
find the next occurrence of the target string in the original string.

Click to expand...

The second one was written by Edward Nilges ("spinoza1111"). He offers
in its defense the assertion that it's the "best" algorithm. (Nevermind
that it's got a bug; the bug could be fixed easily.)

Click to expand...

My thinking:

Click to expand...

The first is better, not because it's less buggy (that could be fixed), but
because it's simpler to understand. It may, or may not, be less efficient.
However, efficiency will vary widely with input characteristics; for some
cases, the arguable inefficiencies may be completely irrelevant, while in
others, they'd be significant.

Click to expand...

I don't think you can dismiss the bugs so easily. I would argue that
it is not a coincidence that the more complex code has more bugs. I
can't write code that looks likespinoza1111'scode and get it right.
I *have* to write simpler code, broken into simple functions, or I have
no chance of getting it to be correct.

For example, the latest improvement introduced another bug[1] and I

For the record, the bug was always there. My changes using the
extensive regression tests tend in my experience not to introduce new
bugs.

don't think that is simple carelessness. Without a re-write I suspect
almost any change is as likely to introduce a bug as fix one (I tried
to see where the error was, but I gave up). If I was paid to fix it,

Then you did not understand the code. I agree that because it fuses
two different functions (find target and replace target) for
efficiency and the hell of it, it is a hard piece of code (and note
that no other solution I've seen adheres to my self-imposed rule of
not using string.h, yet most or all of the other solutions have bugs
that are announced but not fixed).

I also agree that reading C is hard. But if anyone could have
understood the code, you could have.

I'd simply start again and it would end a up as I have already posted,
though to mirror the behaviour I'd now have to add some argument
checking.

....and not use string.h. I recall that you did. Please correct me if I
am wrong.

<snip>

[1] Apparently when the target string is not present in the source
string: e.g. replace("a", "x", "b").

Eric Sosman · Feb 12, 2010

Correctness is to do with the implementation. As a function in the
standard library, it is important to get its interface 'right' (which
relates to ease of use and error handling).

The rightness or wrongness of the interface is not the
implementation's to decide: It must implement the interface(s)
the governing standard(s) describe. An implementation is *not*
at liberty to say "The fact that the FILE* argument is first
for fprintf() and last for fputs() is ugly; let's change fputs()
so the FILE* comes first and thereby improve the interface."
An implementation is *not* allowed to have strcat() return a
pointer to the output's terminating '\0' instead of to its
start, even though the latter would be more helpful. And so
on: the standard dictates the interface, and the implementors
get no say in the matter. (Some of us remember the pre-ANSI days
when implementors exercised their imagination more liberally; I
for one do not ever wish to return to that state of affairs.)

... which is not to say that I'm in love with the library as
it exists; the "bazaar" generates lots of annoying quirks. But
that's The Way Things Are, and we learn to cope. Implementors,
too, learn to cope. And not to invent, not in this domain.

spinoza1111 · Feb 13, 2010

The rightness or wrongness of the interface is not the
implementation's to decide: It must implement the interface(s)
the governing standard(s) describe. An implementation is *not*
at liberty to say "The fact that the FILE* argument is first
for fprintf() and last for fputs() is ugly; let's change fputs()
so the FILE* comes first and thereby improve the interface."
An implementation is *not* allowed to have strcat() return a
pointer to the output's terminating '\0' instead of to its
start, even though the latter would be more helpful. And so
on: the standard dictates the interface, and the implementors
get no say in the matter. (Some of us remember the pre-ANSI days
when implementors exercised their imagination more liberally; I
for one do not ever wish to return to that state of affairs.)

... which is not to say that I'm in love with the library as
it exists; the "bazaar" generates lots of annoying quirks. But
that's The Way Things Are, and we learn to cope. Implementors,
too, learn to cope. And not to invent, not in this domain.

This is an example of subservience to things (and ideas) which I find
limiting.

It's like the nonsense that started at Bell Northern Research three
years after I started and the incompetents came in.

Everyone divided themselves by tribes not defined by passion and
ability (which had created the two man tribe of Whitfield Diffie and
Bob Gaskins that invented Powerpoint) but by loyalty either to DEC or
IBM (this was the early 1980s).

Or more precisely:

We had fed the heart on fantasies,
The heart's grown brutal from the fare;
More substance in our enmities
Than in our love;

(WB Yeats, The Stare's Nest by My Window).

That is, the defects in the apparatus "loved" meant that the "love"
was merely a dislike of the other tribe.

This has devolved to the headhunter logic: that if an applicant has a
PhD in computer science and has implemented an OS for his thesis in C+
+, but the job requires C, she's not "qualified"...because the
headhunter (with some sort of crap business degree) can't put her into
an authoritarian box.

I'm programming in C at a somewhat higher level than anyone else in
this discussion, with all due humility. Nobody else has posted a
solution to the actual problem (do replace without string.h and fix
the bugs, don't stare at them), as far as I've seen. This despite the
fact that technically and in a unique way, I'm a newbie, having not
used C since 1991.

Learning a programming language per se (which is usually combined with
learning programming) is pretty trivial. In 1991, it was RTFM (reading
the fucking manual). Today, you use online help, intellisense and the
Internet to get up to speed, or back up. You make silly errors, but a
real man is unafraid to make silly errors (as opposed to many here).

What's amazing being that one can dip in almost anywhere in a semi-
readable and gloomy book by Theodore Adorno, a midcentury musicologist/
philosopher/sociologist who worked at a rather low level white collar
job, and got fired for trying to do it effectively:

I.Q. – The modes of conduct appropriate to the most progressive
technical state of development are not limited to the sectors, in
which they are actually promoted. Thus thinking submits to the social
supervision of its services not only where it is forced to do so by
its occupation, but comes to resembles such in its entire complexion.
Because thought has been well-nigh inverted into the solution of tasks
assigned to it, what is not assigned is also dealt with according to
the schema of the task. Thought, having lost its autonomy, no longer
trusts itself to comprehend something real for its own sake, in
freedom. This it leaves, with respectful illusion, to the highest-
paid, and makes itself measurable for this. It tends to behave, for
its own part, as if it had to unceasingly portray its usefulness. Even
where there is no nutshell to crack, thinking turns into training [in
English in original] for some sort of exercise or other.

What is "real"? It's not using "obeying the rules of the C standard"
as an excuse for not fixing bugs. And given that many of the "annoying
quirks" of the existing libraries are real bugs or primitive
limitations imposed by older technology, it seems to be that you DO
have to invent.

spinoza1111 · Feb 13, 2010

....code...

I noticed that in the latest (13 Feb) version of the code, with the
fix to the one-character replace found by Ben B, there were some file
nits: two long lines split and blanks at end of continued macro lines.

Let's see if I can get rid of these problems.

Also, I misspoke. The failure to handle Ben's example was caused by an
earlier change.

Here is the latest code, without the console output.

// ***************************************************************
// * *
// * replace() demo *
// * *
// * Demonstrates how to replace non-NUL-defined strings in C *
// * using a simple function, and bypassing string.h. *
// * *
// * C H A N G E R E C O R D --------------------------------- *
// * DATE PROGRAMMER DESCRIPTION OF CHANGE *
// * -------- ---------- --------------------------------- *
// * 02 07 10 Nilges Version 1.0 *
// * *
// * 02 07 10 Nilges Bug: partial matches not handled *
// * correctly: need to iterate search *
// * for match. *
// * *
// * 02 07 10 Nilges 1. Santosh suggested tests *
// * 2. Heathfield put the boot in re *
// * including malloc *
// * *
// * 02 07 10 Nilges 1. Remove string.h use and code *
// * strlen by hand *
// * 2. Add comment block and comments*
// * inline. *
// * 3. free() storage *
// * 4. Use macro for testing *
// * 5. Bug: calculation of *
// * lngNewLength used incorrect *
// * index (intIndex3 instead of 2)*
// * which caused a memory leak. *
// * *
// * 02 07 10 Nilges 1. Bug: Ike Naar test failed. *
// * At end of scan loop, main *
// * string pointer was not *
// * correctly updated from index3 *
// * *
// * 02 07 10 Nilges Added new Santosh test *
// * *
// * 02 07 10 Nilges Added new Ike Naar test *
// * *
// * 02 08 10 Nilges 1. Added some new comments *
// * 2. Make "check for a complete *
// * match "structured" by means *
// * of a tested assignment *
// * 3. Get rid of "replace at end" *
// * evilness: the only time this *
// * flag is meaningful is in the *
// * LAST segment. *
// * 4. Return replace count *
// * 5. TESTER macro assertion added *
// * *
// * 02 10 10 Nilges 1. Bug fix: in a partial match, *
// * the main scan index is set to *
// * one past the end, which misses*
// * full matches that start *
// * between the first character of*
// * the partial match and its end.*
// * *
// * No longer updating the main *
// * scan index (ptrIndex1) to the *
// * point of partial match *
// * failure: setting it one past *
// * the first character of the *
// * partial match. *
// * *
// * 2. Line up expected & actual *
// * results per excellent *
// * suggestion (who made this?) *
// * *
// * 3. After a partial match, update *
// * the main handle search index *
// * (ptrIndex1) to the first *
// * occurrence of the handle *
// * character after the start of *
// * the partial match, or to the *
// * index of the unmatched char. *
// * *
// * 021310 Nilges Bug: failed to handle a one-char *
// * replace (of a by x in b) since *
// * we set ptrIndex2 to point to NUL *
// * which made it appear that there *
// * was a match. When the main index *
// * to the master string (ptrIndex1) *
// * goes off the end of a cliff, we *
// * needed to break out of the search *
// * loop. *
// * *
// * This also entailed setting the *
// * target index ptrIndex2 to 0 at the*
// * beginning of each search so that *
// * it is not erroneously used to *
// * indicate that there's an insert *
// * needed at the end. *
// * *
// * I have also added additional tests*
// * to verify that the code works *
// * when the string and target end *
// * at the same time. *
// * ----------------------------------------------------------- *
// * *
// * "In the near future we shall have to live with the *
// * superstition that programming is 'so easy that even a *
// * Republican can do it!'" *
// * *
// * - E. W. Dijkstra *
// * *
// * *
// ***************************************************************
#include <stdio.h>
#include <stdlib.h>
// ***** Segmentation *****
struct TYPsegmentstruct
{ char * strSegment;
long lngSegmentLength;
struct TYPsegmentstruct * ptrNext; };
// ---------------------------------------------------------------
// Calculate string length
//
//
long strLength(char *strInstring)
{
char *ptrInstring;
for (ptrInstring = strInstring; *ptrInstring; ptrInstring++);
return ptrInstring - strInstring;
}

// ---------------------------------------------------------------
// Replace target by replacement string in master string
//
//
// Caution: the string returned by this function should be freed.
//
//
char * replace(char * strMaster,
char * strTarget,
char * strReplacement,
long * ptrReplacements)
{
char * ptrIndex0;
char * ptrIndex1;
char * ptrIndex2;
char * ptrIndex3;
char * ptrIndex4;
char * strNew;
char * strNewStart;
long lngNewLength;
long lngCount;
long lngReplacementLength;
struct TYPsegmentstruct * ptrSegmentStructStarts;
struct TYPsegmentstruct * ptrSegmentStruct;
struct TYPsegmentstruct * ptrSegmentStructPrev;
lngReplacementLength = strLength(strReplacement);
if (!*strTarget)
{
printf("Error in calling replace(): target can't be null");
abort();
}
ptrIndex1 = strMaster;
ptrSegmentStructPrev = 0;
lngNewLength = 0;
*ptrReplacements = 0;
while(*ptrIndex1)
{
ptrIndex0 = ptrIndex1;
ptrIndex2 = 0;
while (-1)
{
// --- Check for (one character) handle
for(;
*ptrIndex1 && *ptrIndex1 != *strTarget;
ptrIndex1++);
if (!*ptrIndex1) break;
// --- Check for complete match while remembering the
// --- last position of the handle
ptrIndex4 = 0;
for(ptrIndex2 = strTarget + 1,
ptrIndex3 = ptrIndex1 + 1;
*ptrIndex3
&&
*ptrIndex2
&&
*ptrIndex3 == *ptrIndex2;
ptrIndex3++, ptrIndex2++)
{
if (*ptrIndex3 == *strTarget
&&
ptrIndex4 == 0) ptrIndex4 = ptrIndex3;
}
// End test: check complete match, update main ptr past
// partial match while checking for end of loop
if ((!*ptrIndex2 ? ((*ptrReplacements)++, -1) : 0)
||
(!*ptrIndex3 ? (ptrIndex1 = ptrIndex3, -1) : 0))
break;
// Update the main search pointer
ptrIndex1 = (ptrIndex4 == 0 ? ptrIndex3 : ptrIndex4);
}
// --- Create new segment
if (!(ptrSegmentStruct =
malloc(sizeof(struct TYPsegmentstruct))))
abort();
ptrSegmentStruct->strSegment = ptrIndex0;
ptrSegmentStruct->lngSegmentLength =
ptrIndex1 - ptrIndex0;
ptrSegmentStruct->ptrNext = 0;
if (ptrSegmentStructPrev != 0)
ptrSegmentStructPrev->ptrNext = ptrSegmentStruct;
else
ptrSegmentStructStarts = ptrSegmentStruct;
ptrSegmentStructPrev = ptrSegmentStruct;
// --- Update mallocation length
lngNewLength += ptrSegmentStruct->lngSegmentLength +
(ptrIndex2 && !*ptrIndex2
?
lngReplacementLength
:
0);
// --- Get past end of target string & iterate
if (*ptrIndex1) ptrIndex1 = ptrIndex3;
}
// --- Allocate just enough storage for the new string
if (!(strNewStart = malloc(lngNewLength + 1))) abort();
// --- Build the new string whilst freeing the list
strNew = strNewStart;
ptrSegmentStruct = ptrSegmentStructStarts;
while (ptrSegmentStruct)
{
for (ptrIndex1 = ptrSegmentStruct->strSegment,
lngCount = 0;
lngCount < ptrSegmentStruct->lngSegmentLength;
ptrIndex1++, lngCount++, strNew++)
*strNew = *ptrIndex1;
if (ptrSegmentStruct->ptrNext
||
ptrIndex2 != 0 && !*ptrIndex2)
for (ptrIndex1 = strReplacement;
*ptrIndex1;
ptrIndex1++, ++strNew)
*strNew = *ptrIndex1;
ptrSegmentStructPrev = ptrSegmentStruct;
ptrSegmentStruct = ptrSegmentStruct->ptrNext;
free(ptrSegmentStructPrev);
}
*strNew = '\0';
return strNewStart;
}

// ---------------------------------------------------------------
// Statement-format test macro
//
//
#define TESTER(resultPtr, master, target, replacement, expected,
expectedReplacements, replacements) \
{ \
printf("Replace \"%s\" by \"%s\" in \"%s\"\n", \
(target), (replacement), (master)); \
printf("Expect \"%s\":\n \"%s\"\n", \
(expected), \
resultPtr = replace((master), \
(target), \
(replacement), \
&(replacements))); \
printf("Replacements expected: %d: replacements: %d\n", \
(expectedReplacements), \
(replacements)); \
if (!(strLength(resultPtr) \
== \
strLength(master) \
+ \
(strLength(replacement)-strLength(target)) \
* \
replacements)) \
printf("Assertion failed\n"); \
printf("\n\n"); \
free(resultPtr); \
}

// ---------------------------------------------------------------
// Main procedure
//
//
int main()
{
char *ptrResult;
long lngReplacements;
printf("\nReplace\n\n\n");
TESTER(ptrResult,
"1111123bbb1111123bbb11123bb11111231111112111111123",
"111123",
"ono",
"1onobbb1onobbb11123bb1ono1111112111ono",
4,
lngReplacements)
TESTER(ptrResult,
"bbb1111123bbbbb",
"111123",
"ono",
"bbb1onobbbbb",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid error",
"stupid error",
"miracle",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid error",
"stupid",
"miracle",
"a miracle error",
1,
lngReplacements)
TESTER(ptrResult,
"the stupid error",
"the stupid error",
"a miracle",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"the miracle",
"the",
"a",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"a miraclsnirpKamunkle",
"snirpKamunkle",
"e",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"a miraclesnirpKamunkle",
"a miracle",
"",
"snirpKamunkle",
1,
lngReplacements)
TESTER(ptrResult,
" a miraclesnirpKamunkle",
"a miracle",
"",
" snirpKamunkle",
1,
lngReplacements)
TESTER(ptrResult,
" a miraclesnirpKamunklea miraclea miracle",
"a miracle",
"",
" snirpKamunkle",
3,
lngReplacements)
TESTER(ptrResult,
"a miracle a miraclesnirpKamunkle a Miraclea
miracleamiracle",
"a miracle",
"",
" snirpKamunkle a Miracle",
4,
lngReplacements)
TESTER(ptrResult,
"a stupid errord",
"stupid error",
"miracle",
"a miracled",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid errod",
"stupid error",
"miracle",
"a stupid errod",
0,
lngReplacements)
TESTER(ptrResult,
"a sstupid error",
"stupid error",
"miracle",
"a smiracle",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid errorstupid error",
"stupid error",
"miracle",
"a miraclemiracle",
2,
lngReplacements)
TESTER(ptrResult,
"a stupid error stupiderror",
"stupid error",
"miracle",
"a miracle stupiderror",
1,
lngReplacements)
TESTER(ptrResult,
"bbbbbbbbbb",
"b",
"a",
"aaaaaaaaaa",
10,
lngReplacements)
TESTER(ptrResult,
"In the halls of R'yleh great %s lies dreaming",
"%s",
"Cthulu",
"In the halls of R'yleh great Cthulu lies dreaming",
1,
lngReplacements)
TESTER(ptrResult,
"%s%s%s%s%s%s",
"%s",
"Cthulu",
"CthuluCthuluCthuluCthuluCthuluCthulu",
6,
lngReplacements)
TESTER(ptrResult,
"banana",
"ana",
"oat",
"boatna",
1,
lngReplacements)
TESTER(ptrResult,
" a stupid errorstupid errorHeystupid errors",
"stupid error",
"+",
" a ++Hey+s",
3,
lngReplacements)
TESTER(ptrResult,
"foo barfoo barf",
"foo bar",
"bas",
"basbasf",
2,
lngReplacements)
TESTER(ptrResult,
"abab",
"ba",
"ba",
"abab",
1,
lngReplacements)
TESTER(ptrResult,
"abab",
"bab",
"boop",
"aboop",
1,
lngReplacements)
TESTER(ptrResult,
"banana",
"ana",
"ono",
"bonona",
1,
lngReplacements)
TESTER(ptrResult,
"a",
"x",
"b",
"a",
0,
lngReplacements)
TESTER(ptrResult,
"x",
"x",
"b",
"b",
1,
lngReplacements)
TESTER(ptrResult,
"egregious",
"egregious",
"egregious",
"egregious",
1,
lngReplacements)
TESTER(ptrResult,
"egregious",
"egregious",
"x",
"x",
1,
lngReplacements)
printf("\n\nTesting complete: check output carefully: \"Assertion
failed\" should not occur!\n\n");
return 0;
}

spinoza1111 · Feb 13, 2010

....latest code...

Most posters will probably know that in downloading the code you
probably must:

1. Remove all spaces after the continuation markers in the TESTER
macro (at least on the Microsoft compiler: isn't this a bug that the
backslash may not be followed by a space, or was this horse puckey
blessed by the standard?)

2. Rejoin all lines split past position 67 if this is what your
browser does:

2.1 The TESTER header
2.2 The test of "a miracle a miraclesnirpKamunkle a Miraclea
miracleamiracle": note that in the latest post, I screwed up the
repair of this line which is why you should get ONE unexpected result
in your test of the latest code. Sensible and well-intentioned
developers will be able to figure this one out on their own, although
Heathfield will probably start foaming at the mouth
2.3 The last printf

At this point the code still does what appears to be unnecessary work
in the form of examining partial matches that occur starting to the
left of another partial match. This is because it notes only the one-
character "handle" of the left-partial match.

However, we know that

* No match can contain a match inside of it because it is
impossible for a proper substring of a string to match the containing
string, where a "proper" substring of X, X', is one of length(X)-n
where n>0. In simple English, a string cannot contain itself as a
proper substring. I think.

* Therefore, if we find the character handle of the target
inside a partial or complete occurrence of the target, this is an
overlapping string starting at the handle.

It seemed like a good idea, perhaps because I am evading the issue of
Boyer, Moore, Knuth, Pratt, Morris and Snoid, this afternoon to put in
a subsearch for the n-character overlapping target inside the match
loop. But this proved to complexify the code both conceptually and in
the sense that the target matcher had to contain extra instructions:
"do we have a potential overlap?" and "is it matching as well as the
main scan?"

Buys you nothing. You might as well check the left overlap in the next
cycle of the loop.

I believe I have possibly arrived at the most efficient solution
possible without auxiliary tables, because my solution appears to
examine pairs of characters the least amount of times needed. Wow. But
that's one of those beautiful theories forever being set upon by
ruffians in the form of facts.

In view of the face that NO OTHER POSTER (to my knowledge) has met the
requirements (code a string replace function without string.h usage) I
hereby declare victory, and award myself the Golden Spinoza Order of
Purity and Truth har har.

Seriously, I have demonstrated, I believe, that Republicans can't code
worth dick, but they can (if Peter Seebach is still a Republican) say
"there is a bug in my code but I'm fucked if I'll fix it", and they
sure can backstab and lie. That is: there is an entire programming
world-view, a Gestalt, that is complete malarkey and normalized
deviance. It consists of the inchoate saws and maxims that have
accumulated over time, that have sifted down from on-high from capable
early developers, but which have become little more than a cargo
cult.

My code is hard to read, it is said (on dit). In a sense, it is. I
find it hard to read at times myself.

An entire story exists in the cargo cult of the programmer who is in
some versions of the story "too clever by half" and in others only
thinks he is, and who writes an enormous program that nobody can
figure out. The social function of the story is to normalize
differences in ability, and to cover up the fact that many companies
try to solve ill-defined and very hard problems with computers, often
in order to exploit their workers or cheat customers. It omits the
existence of real code written by ordinary members of the white collar
proletariat which solved in an unsung fashion a hard problem...such as
the assembler for the 8K IBM 1401 that was written by a Univ of Chi
graduate student that put the IBM assembler to shame.

When looking for work in the 1990s, I answered an ad from Rockwell-
Wescom, a telecom manufacturer in Chicago. During my interview, the
manager said "some guy many years ago figured out how to use our call
data to reconstruct calls by simulating the PBX: he was pretty
strange." I said, "that was me". I'd become a legend of the dreamtime.
I didn't get a job because, the manager said, I was "too good to be
true". There had to be something wrong with a guy who'd solved a
problem nobody else could solve but was looking for work in his
forties.

Whereas at a shipping conference in 2003 I was greeted with
recognition by the staff of a shipbuilding firm for whom I'd created a
key program in the 1980s. This was because their culture was one of
openness and trust, not myth, and it's important that at the
conference, they said that they weren't bothering to bid for work in
Iraq. It was an ethical firm, of the sort that is in my experience
rare.

Code in some instance might look challenging because the problem is
actually a bit hard. Likewise it was hard for NASA engineers to solve
the problem of O-ring temperatures in cold temperatures in the
Challenger space shuttle, and later for them to solve the problem of
foam insulation dropping off Columbia at launch.

"They say it couldn't be done. So, I didn't do it."

In a post-Enlightenment culture of real science that has sifted down
to low-level technicians distinguished only by their conformity,
passive-aggression, frightened political conservatism (or passive-
aggressive "libertarianism") and/or religious Fundamentalism, the
former science (including Dijkstra's misinterpreted admonitions
concerning simplicity and "the humble programmer") become a cargo cult
of excuses for normalized deviance.

In a recent collection of writings on popular music by my fat pal
Theodore Adorno, he sets out the contrast starkly between classical
and popular music, and this is applicable to programming.

Popular music, he wrote, seems free and liberating, but in actuality
is almost always oversimplified in structure (and this is still true
of pop music years after he wrote it). The dissonances and syncopation
rely for their effect, he wrote, on the fact that for the listener
they are not actual artistic and standalone statements, but instead
sardonic POINTERS (references) back to classical music, light
classical music, and more traditional pop music.

The pop composer, even Irving Berlin or Copland, relies for his effect
on a backreference which says inside the music "listen, I flatted that
note, but you can if you like normalize my deviance by treating what I
said as unserious".

Consider the difference between Phillip Glass or Ygor Stravinsky on
the one hand, and Alban Berg on the other. The audience, according to
a 1980s Time magazine article about Glass, happily "sighed" when
Glass's cheap backreferences to the most innocuous classical music
with an easy-listening rock beat started to drone. Likewise, the
anxious bourgeois applauded Stravinsky (and Picasso) in the 1920s when
they reverted to classical models by means of snarky backreferences
with just enough dissonance to make the audience feel hip.

Whereas Beethoven meant to write the entire symphony in such a way
that ANY change would damage the whole (whereas my formatting and
naming conventions are not something I can willingly discard). (I
really felt for Tom Hulce's Amadeus in the eponymous movie when he was
told to discard notes by the Emperor).

That is, the division is between serious and unserious in programming
as well as music (PJ Plauger's phrase applies: programming on
purpose).

A Pop culture, a cargo cult and culture, creates people like Seebach,
who lie and backstab as standard procedure, allows them to claim that
the bad is good, by formulaic protocol-statements and shibboleths that
have the same function as dissonance in Pop music: saying "oh and by
the way I don't look for %s" is a gesture possible only in a culture
sodden with Pop crap created to make money.

This isn't to say that there couldn't be "serious" Pop music, such as
Robert Johnson's blues, which Adorno knew nothing about. But note that
Johnson wasn't commercially successful. Instead, rich men like Jagger
and McCartney stole what he did. Like Copland and Berlin, they relied
for their best effects on a form of plagiarism that mocks its source.

spinoza1111 · Feb 13, 2010

Then it's shitty code, pure and simple.

That is the common view. I regard it as self-indulgent. Who says you
can read code?

spinoza1111 · Feb 13, 2010

spinoza1111wrote:

<snip>

I see you've ditched malloc.h, so here are my next two observations:

"When everything is ended, you come." - Shakespeare, Henry IV part 2

....dishonestly, like Falstaff, seeking glory. I find it hard to
believe that you've been keeping grave observations in reserve, since
you found no bugs, and would have been as my arch-enemy happy to do so
(stylistic observations about pseudo-portability don't count, and
there were several juicy bugs, none of which you were able to find).

Simpler, and slightly more informative to the caller:

long strLength(const char *s)
{
long len = 0;
while(*s++) ++len;
return len;

}

I'm massively underwhelmed. It's "simpler" only in having fewer
characters in the source code, and this isn't what we mean by
simplicity; mathematics isn't "simple and readable" in the way we
want. Your code saves one assignment: I'll alert the media.

Integrating the increment of s with the test is what you would call
"confusing code" if anyone other than you'd written it, because you're
a narcissist. It defeats what purpose there is in the C for loop,
which was to disambiguate the three parts of a loop. It did so poorly,
but your use manages to make a bad situation worse.

If you can write "len", you can write "length". "len" could be
anything.

Initializing len in its declaration-definition will not even compile
on some C compilers. I thought you wrote portable code. Oh well.

OK, how many errors do we have in your code, if we use your own thug's
standard for errors as "I don't like it?"

Three.

Other suggested improvements: (a) use an unsigned type, such as unsigned
long (strings might have 0 length, but they can never be negatively
long); and (b) drop the function completely, in favour of strlen.

Oh, I dunno. Some strings of nonsense (such as yours) should never
have been typed and have what we might call negative entropy, and we
could represent this as a negative string length.

If you intend this to be a general purpose solution, calling abort() is
not a good idea.

Yes. Monkey and typewriter, but yes. However, you may have noticed if
you were awake that the code is "packaged" as a proof of concept (the
concept being "writing a replace even outside the string library is
not rocket science, but it is too hard for Republicans and Limey
sods").

And Lo! Heathfield labors mightily to stay inside the Game:
But the combat is already over, to each side's shame or fame:
The shadows lengthen on the pitch, he cries play up for me
And so the Combatants turn once more to their game of reading C.
The crowd is hush'd, and expectant Pall falls upon one and all
And then the mighty Heathfield upon the Gods doth cry his call
And show us his solution, alas, 'tis...Trivi-all.
He's missed the point, the boat, the bus, and the London train
If this were not so funny it would give the Gentle pain:
He's rolled offside off the Pitch and onto the green grass
To tell me how to write a string length function, Lord, what an Ass!
In tones of grave and pompous Preaching he groans Admonishment:
Expecting us his Flock to receive him with Astonishment:
But alas in the remains of a day so drear there is silence in the
Crowd:
And then a Titter, then a Chortle, third a Guf-faw, then both long and
Loud
Resounds the Laughter of the Gods which dismays all Philistines
From Hill, to Dale, to Glen, to Pale, are hearden its load Screams:
And so it is, and was, and ever it shall be cried
That Heathfield has struck out and has kicked the ball, offside.

Oh, that was Lofty.

spinoza1111 · Feb 13, 2010

An unnecessary increment on every loop. Potentially expensive for long
blocks. Nilges might be faster in many cases.

Whoa, missed that. Thanks for pointing this out. The count is four.
There is no joy, I am certain, in Mudville.

spinoza1111 · Feb 13, 2010

Some of the best code I have dealt with turned out to need a while to
acclimatise to.

One should NEVER program to the lowest common denominator such as Bill
Cunningham for example.

There appears to be some strange opinion here that ALL code is
constantly being read by hundreds of people as the brilliance of your
output is ported and modded to 1000s of platforms and new projects. It
isn't. Far better to have a small routine complex but EFFICIENT that
"easy to read" and a dog. Complex? Comment it.

"Wisdom crieth without; she uttereth her voice in the streets: She
crieth in the chief place of concourse, in the openings of the gates:
in the city she uttereth her words, saying, How long, ye simple ones,
will ye love simplicity?"

spinoza1111 · Feb 13, 2010

Some of the best code I have dealt with turned out to need a while to
acclimatise to.

Click to expand...

One should NEVER program to the lowest common denominator such as Bill
Cunningham for example.

Click to expand...

There appears to be some strange opinion here that ALL code is
constantly being read by hundreds of people as the brilliance of your
output is ported and modded to 1000s of platforms and new projects. It
isn't. Far better to have a small routine complex but EFFICIENT that
"easy to read" and a dog. Complex? Comment it.

Click to expand...

"Wisdom crieth without; she uttereth her voice in the streets: She
crieth in the chief place of concourse, in the openings of the gates:
in the city she uttereth her words, saying, How long, ye simple ones,
will ye love simplicity?"

Looking over some of the other "solutions" to the problem, I note that
some use memcpy unnecessarily.

Whereas here is what I do when I need to create a segment block:

if (!(ptrSegmentStruct =
malloc(sizeof(struct TYPsegmentstruct))))
abort();
ptrSegmentStruct->strSegment = ptrIndex0;
ptrSegmentStruct->lngSegmentLength =
ptrIndex1 - ptrIndex0;
ptrSegmentStruct->ptrNext = 0;

Not being burdened with "tools" like memcpy that in many ways want to
think for me, I here am as slick as I can be, for I insert a pointer
to the segment start and its length, whereas other alternate solutions
might do extra mem copies, perhaps into the structure, because "it is
good to use tools and unsafe to reinvent the wheel" and other nursery
maxims of the programming cargo cult.

Of course, to cash-out the final string, we do have to do memory
moves:

for (ptrIndex1 = ptrSegmentStruct->strSegment,
lngCount = 0;
lngCount < ptrSegmentStruct->lngSegmentLength;
ptrIndex1++, lngCount++, strNew++)
*strNew = *ptrIndex1;
if (ptrSegmentStruct->ptrNext
||
ptrIndex2 != 0 && !*ptrIndex2)
for (ptrIndex1 = strReplacement;
*ptrIndex1;
ptrIndex1++, ++strNew)
*strNew = *ptrIndex1;

Oh I didn't use memcpy. Hey wait a minute, did I remember to put a Nul
at the end?

Whew:

*strNew = '\0';
return strNewStart;

In fact, if you have my structure and the replacement string, this
data assembly constitutes a virtual new string which in a large and
integrated program could become a space-saving way to represent a
class of similar strings.

Note that while purporting to do our thinking for us, "tools" let us
down (an English officer in September 1914 defined a "battle" as "that
which takes place at the juncture of two different maps"). memcpy
could lull us into forgetting to add the Nul. Not using it keeps us
awake. More generally, "reinventing the wheel" defines us as men, homo
faber, and are we not men.

Implementing strstr	229	Mar 19, 2010
Future standard GUI library	51	May 18, 2013
Multithreading and compatibility library (libconfig)	1	Jan 23, 2013
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
AW: Pure python standard library and License	0	Mar 4, 2011
What is the most astounding C++ syntax construct?	0	Dec 22, 2022
review of the "container library", part 1/?	18	Mar 1, 2011
ParseTree and the Standard Library	0	Jul 27, 2007

Efficency and the standard library

Dann Corbit

Michael Foukarakis

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

Anand Hariharan

spinoza1111

spinoza1111

Eric Sosman

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads