Warning to newbies

spinoza1111 · Feb 16, 2010

Willem wrote:

) However, I have to disappoint you: it's not really a recursive solution,
) It will always recurse exactly once. A smart compiler could inline the
) whole thing, so you get two copies of the loop, with all the if(t) checks
) removed (i.e. one with, and one without the body of that if(t) ).
)
)
) Perhaps as a new challenge one could write a truely recursive solution,
) though. I wonder if it would be possible to do it elegantly.

Without further ado: the really recursive solution:
This time, I have tested it.
(And again, to my dismay, there was one silly bug the first time 'round.)

Note: I do not recommend this function to anybody for reasons other
than academic interest. Note, for example, that the length of the
replacement is recalculated on every recursion but the first.

char *repstr(char *s, char *m, char *r, size_t tl)
{
char *t;
size_t sl = 0, ml = 0, rl;
while (s[sl]) {
if (m[ml] == 0) {
for (rl = 0; r[rl]; rl++) ;
if (t = repstr(s+sl+ml, m, r, tl+sl+rl)) {
while (rl--) { t[tl+sl+rl] = r[rl]; }
while (sl--) { t[tl+sl] = s[sl]; }
}
return t;
}
if (s[sl+ml] != m[ml]) {
sl++;
ml = 0;
} else {
ml++;
}
}
if (t = malloc(tl + sl + 1)) {
do { t[tl+sl] = s[sl]; } while (sl--);
}
return t;

}

char *replace_string(char *str, char *match, char *replace)
{
if (match[0] == 0) return 0; /* Sanity check */
return repstr(str, match, replace, 0);

}

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Let's see, here's the basic idea in one line of pseudo-functional
pseudo-code in a pseudo-language that supports, in addition to the
regular operators, a C-like assignment (assert t is not null), and
garbage collection, that among other things, allows us to "guard" m by
t in index(m+t, t):

replace(m, t, r) = m == "" ? "" : (substr(m, 0, (i = index(m + t, t)))
+ ((i >= 0 && i < len(m)) ? r : "") + replace(substr(m, i+len(t)), t,
r))

Let us transliterate this to a language which supports garbage
collection, C Sharp:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace replaceCSharp
{
class Program
{
static void Main(string[] args)
{ TESTER(
"1111123bbb1111123bbb11123bb11111231111112111111123",
"111123",
"ono",
"1onobbb1onobbb11123bb1ono1111112111ono");
TESTER("bbb1111123bbbbb",
"111123",
"ono",
"bbb1onobbbbb");
TESTER("a stupid error",
"stupid error",
"miracle",
"a miracle");
TESTER("a stupid error",
"stupid",
"miracle",
"a miracle error");
TESTER("the stupid error",
"the stupid error",
"a miracle",
"a miracle");
TESTER("the miracle",
"the",
"a",
"a miracle");
TESTER("a miraclsnirpKamunkle",
"snirpKamunkle",
"e",
"a miracle");
TESTER("a miraclesnirpKamunkle",
"a miracle",
"",
"snirpKamunkle");
TESTER(" a miraclesnirpKamunkle",
"a miracle",
"",
" snirpKamunkle");
TESTER(" a miraclesnirpKamunklea miraclea miracle",
"a miracle",
"",
" snirpKamunkle");
TESTER("a stupid errord",
"stupid error",
"miracle",
"a miracled");
TESTER("a stupid errod",
"stupid error",
"miracle",
"a stupid errod");
TESTER("a sstupid error",
"stupid error",
"miracle",
"a smiracle");
TESTER("a stupid errorstupid error",
"stupid error",
"miracle",
"a miraclemiracle");
TESTER("a stupid error stupiderror",
"stupid error",
"miracle",
"a miracle stupiderror");
TESTER("bbbbbbbbbb",
"b",
"a",
"aaaaaaaaaa");
TESTER("In the halls of R'yleh great %s lies dreaming",
"%s",
"Cthulu",
"In the halls of R'yleh great Cthulu lies
dreaming");
TESTER("%s%s%s%s%s%s",
"%s",
"Cthulu",
"CthuluCthuluCthuluCthuluCthuluCthulu");
TESTER("banana",
"ana",
"oat",
"boatna");
TESTER(" a stupid errorstupid errorHeystupid errors",
"stupid error",
"+",
" a ++Hey+s");
TESTER("foo barfoo barf",
"foo bar",
"bas",
"basbasf");
TESTER("abab",
"ba",
"ba",
"abab");
TESTER("abab",
"bab",
"boop",
"aboop");
TESTER("banana",
"ana",
"ono",
"bonona");
TESTER("a",
"x",
"b",
"a");
TESTER("x",
"x",
"b",
"b");
TESTER("egregious",
"egregious",
"egregious",
"egregious");
TESTER("egregious",
"egregious",
"x",
"x");
TESTER("",
"egregious",
"x",
"");
}

private static void TESTER(string m,
string t,
string r,
string e)
{
Console.WriteLine("Replace " +
"\"" + t + "\" by " +
"\"" + r + "\" in " +
"\"" + m + "\": " +
Environment.NewLine +
"expecting " + "\"" + e + "\"" +
Environment.NewLine +
" " +
"\"" +
replace(m, t, r) +
"\"");
}

static string replace(string m, string t, string r)
{
if (t == "") return "";
int i = 0;
int j = 0;
return m == ""
?
""
:
(m.Substring(0, (i = (m + t).IndexOf(t)))
+
((i >= 0 && i < m.Length) ? r : "")
+
((j = i + t.Length) >= m.Length
?
""
:
replace(m.Substring(j), t, r)));
}
}
}

This worked almost immediately for all our test cases (about 30 mn,
one bug owing to the fact that C sharp does not consider a substring
with an index beyond the end of the string to be valid).

The lesson is that recursive expressions, once inspected by
intelligent and agreeable chaps for correctness, constitute proofs of
validity which are easily transformed into reliable code, as witness
your speed at coding the recursive solution.

The objection to recursion is that the stack might get overloaded and
tromp on top of the heap if (as Schildt tells us, vividly but not in
all cases correctly, needless to say) the heap floats above the stack.
But if your machine is hardware optimized for stacks no prob.

David Thompson · Feb 16, 2010

I'll contribute one [ObC].

I recently wanted to write something which would derive a string value,
then substitute it into another string. Actually, into several other strings.

The problem is that it might be substituted several times, so I couldn't
just do something like:

char *replace = "replace_here:%s";

and use sprintf.

Well you _could_ just pass many copies of the new value pointer, such
that they are more than enough for any 'reasonable' use. Extra varargs
are safely ignored, and the slight inefficiency almost certainly
doesn't matter. But that's even more hacky than what you did.

(In this particular case, the values are "environment variables", but that's
a detail of no particular relevance to C.)

My solution, which is a bit hacky, but I sorta liked:

struct { char *name, *template; } new_environs[] = {
{ "PERL5LIB", "%s/lib64/perl5/5.10.0/i686-linux:%s/lib64/perl5/5.10.0:%s/lib64/perl5/site_perl/5.10.0/i686-linux:%s/lib64/perl5/site_perl/5.10.0:%s/lib64/perl5/vendor_perl/5.10.0/i686-linux:%s/lib64/perl5/vendor_perl/5.10.0:%s/lib64/perl5/vendor_perl:." },
{ "LD_LIBRARY_PATH", "%s/lib64:%s/lib:%s/lib64" },
{ 0, 0 }
};

Then, down below:

for (i = 0; new_environs.name; ++i) {
char *new_environ;
char *t;
char *u;

Presumably s is (already) in the containing scope, and also i. I would
slightly prefer to keep all the vars/decls together. Of course if you
make this a separate function, which is pretty small, as the
downthread development does, you get that for free.

size_t len = strlen(new_environs.template) + 5;
for (s = new_environs.template; t = strchr(s, '%'); s = t + 1) {
len += strlen(parent_path);
}
new_environ = malloc(len);

Click to expand...

I would at least assert(nonnull) even in Q&D hack code. Yes many
probably most machines will trap nicely enough, but I don't want to
even admit to my brain the dangerous thought I can rely on it.

u = new_environ;
*u = '\0';
for (s = new_environs.template; t = strchr(s, '%'); s = t + 2) {
u += snprintf(u, (len - (u - new_environ)), "%.*s%s",
t - s, s, parent_path);
}
snprintf(u, (len - (u - new_environ)), "%s", s);

Click to expand...

As already changed downthread, you don't actually need snprintf
because you've ensured the output buffer is big enough.

But you say (reasonably) you were just being Q&D-safe here, and in
other situations it is needed. Given that, for some reason I always
stumble a little on reading N - (ptr-base) and prefer base+N - ptr.
Or use an offset rather than a pointer:
size_t u = 0; ... u += snprintf (base+u, N-u, ...); ...
/* I like the symmetry of +u and -u there;
especially if I also have, as I sometimes do, a struct to manage
such buffers so it's more like zorg.ptr +u, zorg.size -u */
Though here the input side strongly wants to be pointers, and symmetry
on the output side is nice.

The calculations are very careless; my goal was not to calculate exactly
the right length, but rather, to calculate a length which could be easily
shown to be definitively long enough.

Click to expand...

I understand Q&D bias to the safe side, but even so the 5 mystifies
me. Do you recall a reason, or is that too archeological?

Nick Keighley · Feb 16, 2010

[...]

Replace "egregious" by "x" in "egregious"
Expect "x":
"x"

Replace "egregious" by "x" in ""
Expect "":
""

Testing complete: check output carefully

you could avoid this by making your test subroutine do the check for
you. The TDD people rave about this. Plauger's Library book uses
assert() as a simple test harness.

Seebs · Feb 16, 2010

I understand Q&D bias to the safe side, but even so the 5 mystifies
me. Do you recall a reason, or is that too archeological?

Pure Q&D bias. I got burned once recently on adding 1 to a buffer and then
using 3 more bytes than the underlying length, and I was in a hurry. When
we cleaned it up further, we did it more correctly (using the version I posted
here in response to the Nilges String Replace Challenge).

-s

spinoza1111 · Feb 17, 2010

I'll contribute one [ObC].

Click to expand...

I recently wanted to write something which would derive a string value,
then substitute it into another string. Actually, into several other strings.

Click to expand...

The problem is that it might be substituted several times, so I couldn't
just do something like:

Click to expand...

char *replace = "replace_here:%s";

Click to expand...

and use sprintf.

Click to expand...

Well you _could_ just pass many copies of the new value pointer, such
that they are more than enough for any 'reasonable' use. Extra varargs
are safely ignored, and the slight inefficiency almost certainly
doesn't matter. But that's even more hacky than what you did.

(In this particular case, the values are "environment variables", but that's
a detail of no particular relevance to C.)

Click to expand...

My solution, which is a bit hacky, but I sorta liked:

Click to expand...

struct { char *name, *template; } new_environs[] = {
{ "PERL5LIB", "%s/lib64/perl5/5.10.0/i686-linux:%s/lib64/perl5/5.10..0:%s/lib64/perl5/site _perl/5.10.0/i686-linux:%s/lib64/perl5/site_perl/5.10.0:%s/lib64/perl5/vend or_perl/5.10.0/i686-linux:%s/lib64/perl5/vendor_perl/5.10.0:%s/lib64/perl5/ vendor_perl:." },
{ "LD_LIBRARY_PATH", "%s/lib64:%s/lib:%s/lib64" },
{ 0, 0 }
};

Click to expand...

Then, down below:

Click to expand...

for (i = 0; new_environs.name; ++i) {
char *new_environ;
char *t;
char *u;

Click to expand...

Presumably s is (already) in the containing scope, and also i. I would
slightly prefer to keep all the vars/decls together. Of course if you
make this a separate function, which is pretty small, as the
downthread development does, you get that for free.

size_t len = strlen(new_environs.template) + 5;
for (s = new_environs.template; t = strchr(s, '%'); s = t + 1) {
len += strlen(parent_path);
}
new_environ = malloc(len);

Click to expand...

I would at least assert(nonnull) even in Q&D hack code. Yes many
probably most machines will trap nicely enough, but I don't want to
even admit to my brain the dangerous thought I can rely on it.

u = new_environ;
*u = '\0';
for (s = new_environs.template; t = strchr(s, '%'); s = t + 2) {
u += snprintf(u, (len - (u - new_environ)), "%.*s%s",
t - s, s, parent_path);
}
snprintf(u, (len - (u - new_environ)), "%s", s);

Click to expand...

As already changed downthread, you don't actually need snprintf
because you've ensured the output buffer is big enough.

But you say (reasonably) you were just being Q&D-safe here, and in
other situations it is needed. Given that, for some reason I always
stumble a little on reading N - (ptr-base) and prefer base+N - ptr.
Or use an offset rather than a pointer:
size_t u = 0; ... u += snprintf (base+u, N-u, ...); ...
/* I like the symmetry of +u and -u there;
especially if I also have, as I sometimes do, a struct to manage
such buffers so it's more like zorg.ptr +u, zorg.size -u */
Though here the input side strongly wants to be pointers, and symmetry
on the output side is nice.

The calculations are very careless; my goal was not to calculate exactly
the right length, but rather, to calculate a length which could be easily
shown to be definitively long enough.

Click to expand...

I understand Q&D bias to the safe side, but even so the 5 mystifies
me. Do you recall a reason, or is that too archeological?

Wow...sprintf and its patch snprintf.

sprintf can fail in production, egregiously, at any time.

And...so does snprintf. It FAILS. Because in production it simply
refuses to work if n is too large. Not as dangerous as sprintf, but
allows the little programmer to make a decision for which he's not
generally qualified.

Anything to avoid a little sweat. The biggest crime is "taking too
long", in reality, thinking.

Willem has produced with a little encouragement and assistance from me
an elegant recursive solution. I worked a couple of hours with
collaborative and professional assistance from Naar, Santosh,
Thomasson and others to produce a less elegant linked list
solution...which Seebs is afraid to audit for >5 bugs, even though for
him to find one would allow him to crow.

Instead...using sprintf (and risking what .Net expert and nice guy Dan
Appleman calls the worst bug, the one you never know about) or
snprintf (and risking not doing the job for a potentially infinite
number of useful cases).

spinoza1111 · Feb 17, 2010

[...]

Replace "egregious" by "x" in "egregious"
Expect "x":
"x"

Click to expand...

Replace "egregious" by "x" in ""
Expect "":
""

Click to expand...

Testing complete: check output carefully

Click to expand...

you could avoid this by making your test subroutine do the check for
you. The TDD people rave about this. Plauger's Library book uses
assert() as a simple test harness.

Good point, but: there's always at some point where the rubber meets
the road, and a human bean has to step up and check things. I like
looking over the test cases. What would test the compare test?
Infinite regress.

spinoza1111 · Feb 17, 2010

spinoza1111 said:
spinoza1111 said:

I don't understand. If you claim this "beat" me, would you mind
posting a console output with the test suite in my program? I was
willing to concede toWillemand am willing to concede to you,
although I find your code "hard" to read. The readability of my code
is a readability to someone with a high level of English, therefore,
in resolving this "contest", we should just forget about
"readability", since we seem to have coders of all nations.

Click to expand...

This is turning into the Vancouver Olympics...

Click to expand...

If you not care of readability, not for languages programming, compiler and OS
and little macros, this would be one asm-cpp program

[in few words the function has 2 pass
first pass:
calculate the final string len
second pass
reserve memory for that len
and copy the string and
(has in count that the original strings can be chaged)]

/* bcc32 -v file.cpp asmfile.obj */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define P printf
#define R return

extern "C" char* __stdcall
Replacer_m(unsigned* len, char* origin, char* whatSost, char* sost);

// origin has to came from malloc memory
int Print(char** origin, char* whatSost, char* sost)
{char *p;
unsigned len;
int i;
p=Replacer_m(&len, *origin, whatSost, sost);
if(p==0) return 0;
P("Origin=%s|WhatS=%s|Sost=%s\n", *origin, whatSost, sost);
P("Result=%s len=%u slen=%u\n", p, len, strlen(p));
free(*origin); *origin=p;
R 1;

}

// origin has to came from malloc memory
int tester(char* origin, char* whatSost, char* sost,
char* expected, unsigned n)
{char *p;
unsigned len;
int i;

p=Replacer_m(&len, origin, whatSost, sost);
if(p==0) R 0;
if(strcmp(expected, p)!=0 || len!=strlen(p))
{P("I find one error on test %u:\n", n);
P("Origin=%s|WhatS=%s|Sost=%s\n", origin, whatSost, sost);
P("Expect \"%s\":\nResult \"%s\"\n", expected, p);
P("lenResult=%u strlenResult=%u\n", len, strlen(p));
free(p);
R 0;
}
free(p);
return 1;

}

// ---------------------------------------------------------------
// Main procedure
//
//
int test1(void)
{ unsigned r;
P("\n\nTest1\n");
r=1;
r*=tester("a stupid error","stupid error","miracle","a miracle", 1);
r*=tester("a stupid error","stupid","miracle","a miracle error", 2);
r*=tester("the stupid error","the stupid error","a miracle",
"a miracle", 3);
r*=tester("the miracle","the","a","a miracle", 4);
r*=tester("a miraclsnirpKamunkle","snirpKamunkle","e",
"a miracle", 5);
r*=tester("a miraclesnirpKamunkle","a miracle",
"","snirpKamunkle", 6);
r*=tester(" a miraclesnirpKamunkle","a miracle",
""," snirpKamunkle", 7);
r*=tester(" a miraclesnirpKamunklea miraclea miracle","a miracle",
""," snirpKamunkle", 8);
r*=tester("a miracle a miraclesnirpKamunkle a Miraclea miraclea miracle",
"a miracle",""," snirpKamunkle a Miracle", 9);
r*=tester("a stupid errord","stupid error","miracle","a miracled", 10);
r*=tester("a stupid errod","stupid error","miracle","a stupid errod", 11);
r*=tester("a sstupid error","stupid error","miracle","a smiracle", 12);
r*=tester("a stupid errorstupid error","stupid error","miracle",
"a miraclemiracle", 13);
r*=tester("a stupid error stupiderror","stupid error","miracle",
"a miracle stupiderror", 14);
r*=tester("bbbbbbbbbb","b","a","aaaaaaaaaa", 15);
r*=tester("In the halls of R'yleh great %s lies dreaming","%s",
"Cthulu","In the halls of R'yleh great Cthulu lies dreaming", 16);
r*=tester("%s%s%s%s%s%s","%s","Cthulu",
"CthuluCthuluCthuluCthuluCthuluCthulu", 17);
r*=tester("banana","ana","oat","boatna", 18);
r*=tester(" a stupid errorstupid errorHeystupid errors","stupid error",
"+"," a ++Hey+s", 19);
r*=tester("foo barfoo barf","foo bar","bas","basbasf", 20);
r*=tester("abab","ba","ba","abab", 21);
r*=tester("a", "a", "b", "b", 22);
r*=tester("1111123bbb1111123bbb11123bb11111231111112111111123",
"111123", "ono",
"1onobbb1onobbb11123bb1ono1111112111ono", 23);
r*=tester("bbb1111123bbbbb", "111123", "ono", "bbb1onobbbbb", 24);
r*=tester("a stupid error", "stupid error", "miracle", "a miracle", 25);
r*=tester("a stupid error", "stupid", "miracle", "a miracle error", 26);
r*=tester("the stupid error", "the stupid error", "a miracle",
"a miracle", 27);
r*=tester("the miracle", "the", "a", "a miracle", 28);
r*=tester("a miraclsnirpKamunkle", "snirpKamunkle", "e", "a miracle", 29);
r*=tester("a miraclesnirpKamunkle", "a miracle", "", "snirpKamunkle", 30);
r*=tester(" a miraclesnirpKamunkle","a miracle", ""," snirpKamunkle", 31);
r*=tester(" a miraclesnirpKamunklea miraclea miracle", "a miracle", "",
" snirpKamunkle", 32);
r*=tester("a miracle a miraclesnirpKamunkle a Miraclea miraclea miracle",
"a miracle", "", " snirpKamunkle a Miracle", 33);
r*=tester("a stupid errord", "stupid error", "miracle", "a miracled", 34);
r*=tester("a stupid errod", "stupid error",
"miracle", "a stupid errod", 35);
r*=tester("a sstupid error", "stupid error","miracle", "a smiracle", 36);
r*=tester("a stupid errorstupid error", "stupid error", "miracle",
"a miraclemiracle", 37);
r*=tester( "a stupid error stupiderror", "stupid error", "miracle",
"a miracle stupiderror", 38);
r*=tester("bbbbbbbbbb", "b", "a", "aaaaaaaaaa", 39);
r*=tester("In the halls of R'yleh great %s lies dreaming", "%s",
"Cthulu",
"In the halls of R'yleh great Cthulu lies dreaming", 40);
r*=tester("%s%s%s%s%s%s", "%s", "Cthulu",
"CthuluCthuluCthuluCthuluCthuluCthulu", 41);
r*=tester("banana", "ana", "oat", "boatna", 42);
r*=tester(" a stupid errorstupid errorHeystupid errors", "stupid error",
"+", " a ++Hey+s", 43);
r*=tester("foo barfoo barf", "foo bar", "bas", "basbasf", 44);
r*=tester("abab","ba","ba", "abab", 45);
r*=tester("abab", "bab", "boop", "aboop", 46);
r*=tester( "banana", "ana", "ono", "bonona", 47);
r*=tester( "a", "x", "b", "a", 48);
r*=tester("x","x","b","b", 49);
r*=tester("egregious","egregious","egregious", "egregious", 50);
r*=tester("egregious","egregious","x","x", 51);
r*=tester("","","","", 52);
r*=tester("x","x","","", 53);
r*=tester("x", "","","x", 54);
if(r==0) {P("Test 1: Some error occurs\n\n"); R 0;}
r=tester(0,0,0,0,55);
if(r==0) P("Test 1: Find No error\n\n");
else {P("Test 1: the 0 arg Error\n\n"); R 0; }
R 1;

}

int main(void)
{test1();
return 0;}

-----------------
; nasmw -fobj asmfile.asm
section _DATA use32 public class=DATA

global Replacer_m
extern _realloc
extern _free

section _BSS use32 public class=BSS
section _TEXT use32 public class=CODE

; i8* __stdcall
; Replacer_m(u32* len, i8* origin, i8* whatSost, i8* sost)
; La Funzione alloca in memoria dinamica un array
; dove viene copiato tutta "origin" dove ogni sottostringa
; in "origin" che contiene
; "whatSost" viene sostituita dalla stringa "sost".
;
; Se non sono stati trovati errori, ritorna tale stringa
; da liberare successivamente con free, e la lunghezza
; della stringa ritornata in "len" e CF=0
;
; Altrimenti se quache errore viene trovato: ritorna 0
; [len=0] CF=1
;
; La funzione calcola la presunta lunghezza dell'array
; riserva la memoria, fa la copia nella memoria allocata
; tenendo conto che e' possibile che le stringhe di origine
; siano cambiate da questo programma o da un altro programma.
;
; esempio:
; u32 len;
; i8 *p;
; p=Replacer_m(&len, "qwerty", "ty", "1111111");
; if(p) printf("%s\n", p);
; else printf("Errore\n")
; free(p);
; risultato -> "qwer1111111"
; 0k,4j,8i,12b,16ra, 20P_Len, 24P_fmt, 28P_WhatSost 32P_Sost + 64
; 84 88 92 96
; 0LedSostitStringa, 4sizeOfResult, 8LedSrtingaSubst\0
; 12LedTheSecondPass
align 4
Replacer_m:
push ebx
push esi
push edi
push ebp
sub esp, 64
cmp dword[esp+ 84], 0
je .e0
mov esi, dword[esp+ 88]
cmp dword[esp+ 92], 0
je .e0
cmp dword[esp+ 96], 0
je .e0
cmp esi, 0
je .e0
mov eax, dword[esp+ 84]
mov dword[eax], 0
mov edi, 0
mov dword[esp+ 0], 0
mov dword[esp+ 4], 0
mov dword[esp+ 8], 0
mov dword[esp+ 12], 0
mov ebx, 0
jmp short .0
.e: push edi
call _free
add esp, 4
.e0: xor eax, eax
stc
jmp .z
.0: cmp ebx, dword[esp+ 4]
jb .1
cmp dword[esp+ 12], 0
je .1
mov ecx, ebx
add ecx, 32
push ecx
push ecx
push edi
call _realloc
add esp, 8
pop ecx
cmp eax, 0
je .e
mov edi, eax
mov dword[esp+ 4], ecx
.1: cmp dword[esp], 1
jne .3
xor eax, eax
mov al, [ebp]
cmp eax, 0
jne .2
mov dword[esp+ 0], 0
cmp dword[esp+ 8], 1
je .7
jmp short .0
.2: cmp dword[esp+ 12], 0
je .2a
mov byte[edi+ebx], al
.2a: inc ebp
inc ebx...

read more »

Thanks again, good work, but I can't analyze this to the depth I would
like.

spinoza1111 · Feb 17, 2010

Pure Q&D bias. I got burned once recently on adding 1 to a buffer and then
using 3 more bytes than the underlying length, and I was in a hurry. When
we cleaned it up further, we did it more correctly (using the version I posted
here in response to the Nilges String Replace Challenge).
Where?

-s

Nick Keighley · Feb 17, 2010

[...]

Click to expand...

Replace "egregious" by "x" in "egregious"
Expect "x":
"x"
Replace "egregious" by "x" in ""
Expect "":
""

Click to expand...

Click to expand...

you could avoid this by making your test subroutine do the check for
you. The TDD people rave about this. Plauger's Library book uses
assert() as a simple test harness.

Click to expand...

Good point, but: there's always at some point where the rubber meets
the road, and a human bean has to step up and check things. I like
looking over the test cases. What would test the compare test?
Infinite regress.

I generally run a simple case where a human checks the output. Or I
run a case that is supposed to fail. Humans are poor at checking
repetitive things so if your test harness output went on for pages or
had to be run many times I'd distrust a purely human checked result. A
human could certify the test subroutine was ok by inspection. This
only has to be done once.

--
<shrug> Repeat the mantra: "All software sucks". Software,
regardless of the
language or OS, is being used to handle real-world, life-or-death
problems.
THAT should cause fear, except that the alternative is for every
single
emergency to be handled entirely by humans... <shudder>

Nick · Feb 18, 2010

Seebs said:
Eek!

No wonder my boss never seems to accept "argued with Edward Nilges" as
a week's work.

Actually, in a fit of irony, we ended up needing the revised string replace
function I posted here the other day, because new data revealed that our
original spec wasn't flexible enough. Needless to say, it worked fine
on the first try.

You see! Now you've got nothing to do. Had you been properly
educated (spent the last 30 years taking CS classes for example) you'd
still be arguing about whether to use the standard library and would
still have plenty of work for the next three months.

Ben Bacarisse · Feb 18, 2010

io_x said:
How are you so sure that
strlen(src) + matches*rlen - matches*mlen + 1
not overflow the unsigned type?

I'm not. In fact I know that there can't be any simple expression
that does not cause the result to be wrong because it can be wrong "by
logic" so to speak. If the input is "axxxx....xxx" of length SIZE_MAX
replacing the 'a' with "bb" will result in a string whose length is
not representable in a size_t.

You have a good point none the less. I should have written

strlen(src) - matches*mlen + matches*rlen + 1

because it will work for a wider range of inputs.

this fail my test for
src match replacement expected n. of test
r*=tester( "" , "", "", "", 52);
r*=tester( "x", "", "", "x", 54);

going in one never ending loop

Yes. I opted to make the fact that match should not be empty part of
the contract with the caller, in the same way that the pointers can't
be null either. This may seem odd and it is certainly debatable, but
there is some logic to it. When using C, I am not a fan of functions
that use the narrow channel of the return value to tell the caller
things that they know (or could know very simply) such as a NULL
argument or an empty match. The caller must decide what to do when
the match string is null so making that part of the contract forces
the caller to either ensure that this situation does not happen or to
handle it as they see fit. Returning NULL (as some people have
chosen to do) seems to me to be unhelpful. The caller needs to check for
NULL anyway, but will now have to disambiguate between NULL due to no
memory and NULL due to some other logic error that they could have
tested for before the call rather than after getting a NULL return.

The only time that returning NULL is a clear win is when that
is exactly what the caller wants in the particular situation and I
think that is probably not true for this particular case.

I say "in C" because in a language with exceptions I'd be happy to
have the function signal different error conditions that way. The
exceptions simply provide another way to structure the code: the
caller can test for empty match strings /before/ the call of they can
handle the "null match" exception after.

Ben Bacarisse · Feb 18, 2010

io_x said:
the above seems the faster too

Test1
Test Ben
Test=1 Speed=10296|Test=2 Speed=3208|Test=3 Speed=3688|Test=4
Speed=3280|Test=5 Speed

This is not a very readable format, but unless "Speed" is misnamed,
your data show that mine is the slowest!

If it is "Time" rather than "Speed" then your results are similar to
mine. In many cases, the simple double strstr version is faster than
the complex, buggy, one from spinoza1111. Yours may be simple too, I
don't remember -- sorry.

A little work to avoid re-scanning for matches and the code can be
made very much faster. I use an array since this can be used to
special case the situation where there are only a few matches.

Seebs · Feb 18, 2010

You see! Now you've got nothing to do. Had you been properly
educated (spent the last 30 years taking CS classes for example) you'd
still be arguing about whether to use the standard library and would
still have plenty of work for the next three months.

No wonder I've had so much trouble improving my efficiency ratings.

-s

blmblm · Feb 20, 2010

spinoza1111 said:
spinoza1111 said:

spinoza1111<[email protected]> wrote:

Click to expand...

[ snip ]

It's in the fact that he's unqualified,

Click to expand...

[ "he" == Seebs ]

I don't agree with you that this is an established fact.

Click to expand...

So don't agree. However, he's established it:

Click to expand...

"The heap is a DOS term".

Click to expand...

A statement which he later acknowledged was not the best way
of making his intended point. Why keep quoting a phrase that
its author has acknowledged was unclear? (Mostly a rhetorical
question.)

Click to expand...

Because Seebach insists on hounding Schildt and in his latest stunt,
presenting a solution of mine that was out of date.

He wants to be forgiven for his mistakes but won't let others' errors
alone.

Click to expand...

Mistakes that are publicly acknowledged are, in my opinion, easier
to forgive than those that are never acknowledged and sometimes even
repeated.

No, it doesn't. If percent occurs anywhere else in the input the code
will break.

Click to expand...

But if the precondition excludes this case -- as I believe it did [*],
in the rather narrow context for which the code was written --
then the code does meet its spec. One could argue that the spec
was badly designed, but not that the code doesn't meet it.

[*] I could be wrong about this. Perhaps Seebs can easily confirm
this, or not.

If it happens to occur at the end, his code may very well
crash. Other people have presented similarly buggy solutions without a
testing framework like mine and have admitted bugs without, for the
most part, fixing them. Whereas I have responded to each and every bug
report I've seen.

C was the wrong tool for the specific job. I am not saying that I
would have solved the same problem with a fully general replace(),
although I might if I foresaw a need for it elsewhere. Seebach wanted
to establish chops in C, so he presented the solution of a problem
that should have used a language with replace() already available.

Click to expand...

[ snip ]

blmblm · Feb 20, 2010

[ snip ]

A simpler test program. Surely the job of writing and testing
a function to compare two strings is simpler than the problem
under discussion here?

I would argue, by the way, that some of your test data makes the
idea of automated checking especially attractive -- output of
the first test (in the most recent version as best I can tell) is

"1onobbb1onobbb11123bb1ono1111112111ono"

and I'm not sure I trust myself to correctly compare that with
actual output.

Eh. At some point you have to believe that *something* works.

I generally run a simple case where a human checks the output. Or I
run a case that is supposed to fail. Humans are poor at checking
repetitive things so if your test harness output went on for pages or
had to be run many times I'd distrust a purely human checked result. A
human could certify the test subroutine was ok by inspection. This
only has to be done once.

How so? You've described your test suite as meant for regression
testing, no? which implies verifying that the program produces the
same output it produced before, which if it's not being done by a
human is being done by -- what? (I'd use the UNIX "diff" command,
but then I'm relying on its being correctly implemented. I do think
the odds of that are better than the odds of a human being able to
do the same comparison reliably. "Whatever", maybe.)

[ snip ]

Seebs · Feb 20, 2010

But if the precondition excludes this case -- as I believe it did [*],
in the rather narrow context for which the code was written --
then the code does meet its spec. One could argue that the spec
was badly designed, but not that the code doesn't meet it.

[*] I could be wrong about this. Perhaps Seebs can easily confirm
this, or not.

It did. The input space consisted of two strings, defined in the same
module of code, which were compile-time literals.

Spinoza goes on to say:

Actually, no, the program in question really did have good reason to be
in C. It was there to be an extremely lightweight wrapper that set two
environment variables and then executed another program. C was definitely
the right choice.

I cannot conceive of how that could "establish chops in C". It was a
half-hour thing to do while waiting for a build. I don't consider that to
be "establishing chops" any more than an experienced professional pianist
would be "establishing chops" by playing Chopsticks.

Fundamentally, code isn't about reputation, it's about getting things done.
For that task, I needed a language which was not "interpreted" (meaning, the
executable wouldn't be a #! file on a Unix machine), and which had access to
trivial string manipulation, system calls, and the ability to run a very lean
binary which does very little. C was the right tool, so I used it. Languages
with a "replace" feature built in are in general larger and slower (bad fits)
and typically interpreted (categorically unusable in this case).

-s

spinoza1111 · Feb 21, 2010

But if the precondition excludes this case -- as I believe it did [*],
in the rather narrow context for which the code was written --
then the code does meet its spec. One could argue that the spec
was badly designed, but not that the code doesn't meet it.
[*] I could be wrong about this. Perhaps Seebs can easily confirm
this, or not.

Click to expand...

It did. The input space consisted of two strings, defined in the same
module of code, which were compile-time literals.

Spinoza goes on to say:

Actually, no, the program in question really did have good reason to be
in C. It was there to be an extremely lightweight wrapper that set two
environment variables and then executed another program. C was definitely
the right choice.

I cannot conceive of how that could "establish chops in C". It was a
half-hour thing to do while waiting for a build. I don't consider that to
be "establishing chops" any more than an experienced professional pianist
would be "establishing chops" by playing Chopsticks.

You consistently hold yourself to a lower standard than you hold
others, and a gentleman holds himself to a higher standard. You've
made absurd generalizations based on my unnecessary (but not in fact
erroneous, in fact over-cautious) use of malloc.h about "carelessness"
in a case where an extra step (#include malloc.h) was an excess of
care.

Whereas you exhibited with pride code that (its evanescence aside) was
meant to be an example of using C for "newbies" that made a conceptual
error. That conceptual error didn't matter in the trivial project you
were assigned, but I am at a loss to understand why you posted it
other than an unwarranted vanity. You claim that far less serious
errors in Schildt might mislead newbies, yet you misled them here far
more seriously than Schildt.

This was because your error was independent of the laws of C, but was
a design error with regards to potential inputs.

Fundamentally, code isn't about reputation, it's about getting things done.

Oh? It seems to me that you and Heathfield are all about "reputation",
especially when it comes to calling Apress colleagues "morons".

For that task, I needed a language which was not "interpreted" (meaning, the
executable wouldn't be a #! file on a Unix machine), and which had access to
trivial string manipulation, system calls, and the ability to run a very lean
binary which does very little. C was the right tool, so I used it. Languages
with a "replace" feature built in are in general larger and slower (bad fits)
and typically interpreted (categorically unusable in this case).

We still don't understand why you posted one-off code that doesn't
work anywhere else.

spinoza1111 · Feb 21, 2010

But if the precondition excludes this case -- as I believe it did [*],
in the rather narrow context for which the code was written --
then the code does meet its spec. One could argue that the spec
was badly designed, but not that the code doesn't meet it.
[*] I could be wrong about this. Perhaps Seebs can easily confirm
this, or not.

Click to expand...

It did. The input space consisted of two strings, defined in the same
module of code, which were compile-time literals.

Spinoza goes on to say:

Actually, no, the program in question really did have good reason to be
in C. It was there to be an extremely lightweight wrapper that set two
environment variables and then executed another program. C was definitely
the right choice.

I cannot conceive of how that could "establish chops in C". It was a
half-hour thing to do while waiting for a build. I don't consider that to
be "establishing chops" any more than an experienced professional pianist
would be "establishing chops" by playing Chopsticks.

Actually, geniuses like to start with Czerny and Chopsticks to show
what pompous idiots miss. Like Glenn Gould, they prefer, while playing
Czerny etudes, Chopsticks, or a trivial waltz of Diabelli, to
transform them into something that's not boring. Your code was boring,
so I've transformed it into a mighty tree of interesting discussion,
not merely my own replace().

In the film Amadeus, Mozart takes a trivial march by a pompous fool
(Salieri) and transforms it into the brilliant tune that became the
"farfallone amoroso" aria in The Marriage of Figaro: watch it on
YouTube:

Mediocre programmers are only too happy, in my experience, to so
overdefine a problem ("oh, it's just a constant") to get something
done fast, where owing to the financialization of the US economy,
there's never (and I mean never) enough "time" to do it right, since
this might discommode some rich pig of the CEO class who regards
programmers as dirt bags.

They are then like Salieri too anxious to present their trivial
horseshit as exemplary.

Whereas great people in any field like to solve problems at the limit
of the state of the art and their capability. They would ask why we
can always expect % to be followed by s. They would ask, like Mozart,
"have you tried".

I understand, of course, that this means that relative to this
problem, I think I'm Mozart and you're Salieri, and the whole point in
American business is to drive the Mozarts out, or alternatively
convince people that they're not Mozart, just a "mere" programmer no
better than coworkers like you, whose personal conduct I find utterly
cheap and disgusting quite apart from (but related to) your lack of
intellect.

But I've worked with some real Mozarts in their fields including Whit
Diffie and John Nash, and I conclude that there's just an even wider
gulf between you and them, into which I fit with room to spare.

Smart people don't wait for a challenge. They find challenge where
they are since otherwise they are bored. We have found in the
replace() thread that thinking in terms of the C library hides a real
optimization for longer targets that start with common data that
doesn't require state in the form of tables: just keep track, while
matching the target, of the leftmost occurence of the one character
handle, and restart from this. strstr doesn't give you this
information which means that it is unacceptable for certain text
scanning applications.

Now, having seen the film Amadeus and a presentation of the much more
intense stage play recently here in Hong Kong, I understand that in
revenge for Mozart's send up of the Salieri march, Salieri set out to
destroy Mozart in the same way you set out to destroy the reputation
of an Apress colleague. And he succeeded. You won't.

blmblm · Feb 22, 2010

[ snip ]

You consistently hold yourself to a lower standard than you hold
others, and a gentleman holds himself to a higher standard. You've
made absurd generalizations based on my unnecessary (but not in fact
erroneous, in fact over-cautious) use of malloc.h about "carelessness"
in a case where an extra step (#include malloc.h) was an excess of
care.

How did you decide that the #include for malloc.h was necessary, or
prudent? On the platform(s) I know best, when I want to use a function
from the C library, I consult its man page, which tells me what to
#include (if anything), and for malloc, it tells me stdlib.h. (I'm
genuinely curious here, not sniping for sniping's sake -- I don't
think I've ever done C programming on a Microsoft platform and so
don't know what reference material is readily available.)

[ snip ]

Oh? It seems to me that you and Heathfield are all about "reputation",
especially when it comes to calling Apress colleagues "morons".

This is a usage of "colleague" that seems strange to me. So you
regard everyone who has had something published by the people
who published your book as a colleague, even those you've never
met and know nothing about. "Just sayin'", maybe.

[ snip ]

Nick Keighley · Feb 22, 2010

[...] You've
made absurd generalizations based on my unnecessary (but not in fact
erroneous, in fact over-cautious) use of malloc.h about "carelessness"
in a case where an extra step (#include malloc.h) was an excess of
care.

Click to expand...

How did you decide that the #include for malloc.h was necessary, or
prudent? On the platform(s) I know best, when I want to use a function
from the C library, I consult its man page, which tells me what to
#include (if anything), and for malloc, it tells me stdlib.h. (I'm
genuinely curious here, not sniping for sniping's sake -- I don't
think I've ever done C programming on a Microsoft platform and so
don't know what reference material is readily available.)

I googled "MSDN malloc" and got a first hit of
http://msdn.microsoft.com/en-us/library/6ewkz86d(VS.80).aspx

this tells you use *both* stdlib.h and malloc.h. I suppose that
justifies spinoza's "excess of care" claim. Whenever I've used MS
compilers I've never used malloc.h. For standard functions I usually
google the function name. Not foolproof but I can usually spot
anything bizzarely odd. I usually just want something I can copy-paste
the prototype out of.

This is a usage of "colleague" that seems strange to me. So you
regard everyone who has had something published by the people
who published your book as a colleague, even those you've never
met and know nothing about. "Just sayin'", maybe.

google this newsgroup for the word "collegiate"- he uses that in an
odd sense as well

Proposal for a non-periodic CPRNG (WARNING: CROSSPOST!)	4	Apr 28, 2012
Warning on assigning a function-returning-a-pointer-to-arrays	17	Jun 26, 2006
Adapting software to multiple usage patterns	90	Oct 13, 2009
Statement on Schildt submitted to wikipedia today	119	Sep 6, 2009
Proposal for Amendment to Section 6.5.3.2, Unary * Operator	65	Jul 31, 2010
Why I love Python (warning: rambling)	28	Oct 28, 2003
Undefined Behaviour designed to be caught [Was: Books for advancedC++ debugging]	9	Jul 17, 2009
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008

Warning to newbies

spinoza1111

David Thompson

Nick Keighley

Seebs

spinoza1111

spinoza1111

spinoza1111

spinoza1111

Nick Keighley

Nick

Ben Bacarisse

Ben Bacarisse

Seebs

blmblm

blmblm

Seebs

spinoza1111

spinoza1111

blmblm

Nick Keighley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads