S
Seebs
This may have gotten buried given that it started in a Nilges thread,
but it's actually pretty interesting.
There was some discussion of algorithms to perform the task:
Inputs: target string, replacement string, original string
Output: a newly-allocated string containing the original
string with each copy of the target string replaced with the
replacement string.
Here's a hunk of mine:
for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
++count;
t += inlen;
}
Here's a hunk of another one:
These hunks are essentially performing the same part of the core loop;
find the next occurrence of the target string in the original string.
The second one was written by Edward Nilges ("spinoza1111"). He offers
in its defense the assertion that it's the "best" algorithm. (Nevermind
that it's got a bug; the bug could be fixed easily.)
My thinking:
The first is better, not because it's less buggy (that could be fixed), but
because it's simpler to understand. It may, or may not, be less efficient.
However, efficiency will vary widely with input characteristics; for some
cases, the arguable inefficiencies may be completely irrelevant, while in
others, they'd be significant.
But you should always write the simpler one first, and wait until you've
profiled the code and verified that there's a bottleneck, and you know where
it is, before trying to optimize. You may, or may not, be able to outperform
a given library's implementation of strstr(). If you have unusual inputs,
you have a pretty good shot at it... But it may not be worth it. The
complicated example has had at least one bug identified, and there may be
others. It's extremely hard to read, partially because of the poor
naming, and partially because of the counter-idiomatic usage. But mostly,
it's a lot more work to write and maintain, and for no known benefit.
It's good to think a little about efficiency, but write for clarity and
correctness first, so you have a known-working program to check your
results if you later need to modify it for speed.
-s
but it's actually pretty interesting.
There was some discussion of algorithms to perform the task:
Inputs: target string, replacement string, original string
Output: a newly-allocated string containing the original
string with each copy of the target string replaced with the
replacement string.
Here's a hunk of mine:
for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
++count;
t += inlen;
}
Here's a hunk of another one:
ptrIndex1 = strMaster;
while(*ptrIndex1)
{
ptrIndex0 = ptrIndex1;
while (-1)
{
for(;
*ptrIndex1 && *ptrIndex1 != *strTarget;
ptrIndex1++);
for(ptrIndex2 = strTarget, ptrIndex3 = ptrIndex1;
*ptrIndex3
&&
*ptrIndex2
&&
*ptrIndex3 == *ptrIndex2;
ptrIndex3++, ptrIndex2++);
if (!*ptrIndex2) break;
ptrIndex1 = ptrIndex3;
if (!*ptrIndex3) break;
}
These hunks are essentially performing the same part of the core loop;
find the next occurrence of the target string in the original string.
The second one was written by Edward Nilges ("spinoza1111"). He offers
in its defense the assertion that it's the "best" algorithm. (Nevermind
that it's got a bug; the bug could be fixed easily.)
My thinking:
The first is better, not because it's less buggy (that could be fixed), but
because it's simpler to understand. It may, or may not, be less efficient.
However, efficiency will vary widely with input characteristics; for some
cases, the arguable inefficiencies may be completely irrelevant, while in
others, they'd be significant.
But you should always write the simpler one first, and wait until you've
profiled the code and verified that there's a bottleneck, and you know where
it is, before trying to optimize. You may, or may not, be able to outperform
a given library's implementation of strstr(). If you have unusual inputs,
you have a pretty good shot at it... But it may not be worth it. The
complicated example has had at least one bug identified, and there may be
others. It's extremely hard to read, partially because of the poor
naming, and partially because of the counter-idiomatic usage. But mostly,
it's a lot more work to write and maintain, and for no known benefit.
It's good to think a little about efficiency, but write for clarity and
correctness first, so you have a known-working program to check your
results if you later need to modify it for speed.
-s