No, I fixed that. Here is a version just for you, with this test
added.
I normally ignore your stuff, but since this is actual code, I figure it
deserves a bit of a look.
Your various statements about the original example are deeply confused.
There was no "spec" that I short-circuited; I wrote the data and the
code to fit each other. In any event, your code is fascinating.
// * Demonstrates how to replace non-NUL-defined strings in C *
// * using a simple function, and bypassing string.h. *
Congratulations! You're barely into the comments and you've already
made your first obvious mistake. "NUL-defined" isn't even meaningful,
but "NUL-terminated" is... And all three of the arguments to your function
are, in fact, NUL-terminated. You seem to be making much of the lack
of a NUL byte at the end of the substrings, but really, that's not
particularly unusual; strstr() has always handled that just fine.
#include <stdio.h>
#include "malloc.h" // Per RH ess muss sein
No, RH pointed out that "malloc.h" did not need to be included at all,
since it's not a standard header and malloc is adequately declared in
#include <stdlib.h>
// ***** Segmentation *****
struct TYPsegmentstruct
{ char * strSegment;
long lngSegmentLength;
int intReplaceAtEnd;
struct TYPsegmentstruct * ptrNext; };
This is an exceptionally elaborate structure definition. I am also
hugely amused by your decision to make your systems hungarian even LESS
usable out of spite.
// ---------------------------------------------------------------
// Calculate string length
//
//
long strLen(char *strInstring)
{
char *ptrInstring;
for (ptrInstring = strInstring; *ptrInstring; ptrInstring++);
return ptrInstring - strInstring;
}
While this does indeed appear to work, it's shoddy.
First, you should have written the loop as:
for (ptrInstring = strInstring; *ptrInstring; ptrInstring++)
;
Accidentally putting a semicolon on the end of a for loop is a common
typo. The reader is better served by an explicit "loop body" even if
it's just a semicolon.
// ---------------------------------------------------------------
// Replace target by replacement string in master string
//
//
// Caution: the string returned by this function should be freed.
//
//
char * replace(char * strMaster,
char * strTarget,
char * strReplacement)
{
So far, this looks fine.
char * ptrIndex0;
char * ptrIndex1;
char * ptrIndex2;
char * ptrIndex3;
This, by contrast, is just plain stupid. How is the reader supposed to know
what these are used for? This is like a parody of variable naming
conventions. You're so busy telling us that these are "pointer indexes" that
you forget to suggest or even hint at what they'll be used for. It's like
reading a large chunk of numerical processing in which the variables are
all named "mathematicalValue0", "mathematicalValue1", and so on.
char * strNew;
char * strNewStart;
long lngNewLength;
long lngCount;
long lngReplacementLength;
struct TYPsegmentstruct * ptrSegmentStructStarts;
struct TYPsegmentstruct * ptrSegmentStruct;
struct TYPsegmentstruct * ptrSegmentStructPrev;
lngReplacementLength = strlen(strReplacement);
It seems suspiciously likely that you have more variables here than you
actually need to implement this, but of course, it's hard to guess what you
might have been thinking. The TYPsegmentstruct thing seems to be an
exceptionally poor design choice already, though.
if (!*strTarget)
{
printf("Error in calling replace(): target can't be null");
abort();
}
Error handling with a gun.
So. You start with ptrIndex1. You haven't used ptrIndex0 yet. Also,
there is no future reference to strMaster in the code, suggesting that you
don't need the old value for anything -- so storing a copy of it seems
unnecessary. Just use strMaster.
ptrSegmentStructPrev = 0;
lngNewLength = 0;
It's not unheard of to put stuff like this in the variable declaration,
since that's usually easier to keep straight.
while(*ptrIndex1)
{
ptrIndex0 = ptrIndex1;
while (-1)
So you're telling us here that anything I say, no matter how universally
acknowledged it is, you're going to do the other way just out of spite. Okay.
Furthermore, why not actually specify the loop condition as the loop
condition? Oh, that's right. Because you're crazy.
{
for(;
*ptrIndex1 && *ptrIndex1 != *strTarget;
ptrIndex1++);
for(ptrIndex2 = strTarget, ptrIndex3 = ptrIndex1;
*ptrIndex3
&&
*ptrIndex2
&&
*ptrIndex3 == *ptrIndex2;
ptrIndex3++, ptrIndex2++);
if (!*ptrIndex2) break;
ptrIndex1 = ptrIndex3;
if (!*ptrIndex3) break;
}
Here's where you would have benefitted a lot from giving these variables
semantically indicative names.
To summarize this loop, for anyone following along at home and unable to make
sense of this godawful spaghetti code:
until something breaks us out of this loop
advance to the first instance of the first character
in the string to be replaced
point index2 at the string to be replaced, and index3
at the character which was the same as it
iterate as long as both of them are non-NUL and the same.
break if *index2 is NUL (this would mean that we matched
the entire string to be replaced)
point index1 (our location in the master string) at index3
otherwise.
if *index3 is NUL, break there too.
This is massively complicated by a refusal to use the provided tools, but
that alone isn't enough to cover it. This is a gratuitously complicated
and spaghetti-code way of going about doing this.
But whatever. We have gotten out of the loop. If *ptrIndex2 is a nul byte,
we found a thing to replace. Otherwise, we reached the end of the original
string. In either case, ptrIndex1 now points to either just past the thing
to replace, or to the end of the original string.
if (!(ptrSegmentStruct =
malloc(sizeof(struct TYPsegmentstruct))))
abort();
Wonderful error handling.
ptrSegmentStruct->strSegment = ptrIndex0;
ptrSegmentStruct->intReplaceAtEnd = *ptrIndex2 ? 0 : -1;
ptrSegmentStruct->lngSegmentLength =
ptrIndex1 - ptrIndex0;
So, Segment is pointed to the point in the original string where
we started this block. ReplaceAtEnd is set to 0 or nilgesTrue to indicate
whether or not to include the replacement string. The segment length
is the difference between where we are now in the original string
and where we were.
lngNewLength += ptrSegmentStruct->lngSegmentLength +
(!*ptrIndex2
?
lngReplacementLength
:
0);
This really shows a great openness to bugs. An ordinary developer would
have started NewLength as the length of the original string, then maybe
added the difference between the replacement and target lengths every time
a target string was found. But not you! You carefully add the length
of the original string by bits and pieces.
Special credit for the gratuitously elaborate calculation based on *ptrIndex2.
You already determined whether to ReplaceAtEnd; you could have used that
value, rather than recalculating it. But that's okay, because you've inverted
the sense of the comparison; while previously, you had:
*ptrIndex2 ? <no> : <yes>
here you have
!*ptrIndex2 ? <yes> : <no>
That's a great way to keep readers on their toes!
Remember, *any sort of consistency at all* is the hobgoblin of little
minds. Ralph Waldo Emerson said that when I was helping him write poetry
in C.
ptrSegmentStruct->ptrNext = 0;
if (ptrSegmentStructPrev != 0)
ptrSegmentStructPrev->ptrNext = ptrSegmentStruct;
else
ptrSegmentStructStarts = ptrSegmentStruct;
ptrSegmentStructPrev = ptrSegmentStruct;
This is a technically correct way to handle a linked list, but it does
highlight the inappropriateness of the data structure. This is a gratuitously
complicated series of operations.
This is nicely opaque, because you chose completely meaningless names, but
it turns out that:
* if ptrIndex1 had reached the end of the original string, ptrIndex3
was set equal to it and then that loop exited before incrementing
anything, so ptrIndex3 is already the same as ptrIndex1.
* if ptrIndex1 was not at the end of the original string, and we
started the inner loop to see whether we'd found a complete match
for the target string...
* if the match was found, ptrIndex3 points to the end of the match.
* if the match was not found, the loop up at the top kept going.
So either ptrIndex1 already pointed at the end of the original string, and
this is a no-op, or ptrIndex3 pointed to just past the end of an instance
of the target string, and this does indeed move ptrIndex1 to the right
place.
Had you chosen clearer names and not written spaghetti code, this wouldn't
take explanation.
if (!(strNewStart = malloc(lngNewLength + 1))) abort();
Wonderful error handling.
strNew = strNewStart;
ptrSegmentStruct = ptrSegmentStructStarts;
while (ptrSegmentStruct)
{
for (ptrIndex1 = ptrSegmentStruct->strSegment, lngCount = 0;
lngCount < ptrSegmentStruct->lngSegmentLength;
ptrIndex1++, lngCount++, strNew++)
*strNew = *ptrIndex1;
if (ptrSegmentStruct->intReplaceAtEnd)
for (ptrIndex1 = strReplacement;
*ptrIndex1;
ptrIndex1++, ++strNew)
*strNew = *ptrIndex1;
ptrSegmentStructPrev = ptrSegmentStruct;
ptrSegmentStruct = ptrSegmentStruct->ptrNext;
free(ptrSegmentStructPrev);
}
This is an extremely elaborate way to keep track of everything. It's
conceptually interesting. Rather than tracking the offsets in the
original string at which the target string occurred (which is the only
information you need to do this; the other information can be derived
from that trivially), you store a series of pointers, lengths, and
additional flags. This allows you to make a much more elaborate loop
here.
*strNew = '\0';
return strNewStart;
}
This part is, surprisingly, probably correct.
The test harness isn't interesting.
Overall... This is crap. It's massively overengineered, in a way that
carefully creates multiple brand new opportunities for brittleness. Your
inexplicable refusal to use any of the standard tools (such as strlen
or memcpy) hardly goes far enough to explain this.
If you really wanted a linked list structure, you could have done just fine
with:
struct replace_at { int offset; struct replace_at *next; };
If you know the offset, you can start at position zero, and:
* copy everything up to the first offset
* add the replacement string
* skip the target (whose length you already know)
* repeat
* copy whatever's left when you're done
It's a simpler loop which only requires a single data point rather than
three. Since the only way ReplaceAtEnd can be zero is when you've reached
the end of the string, it's obvious that you can use the next pointer as
a surrogate for it in this context.
So, long story short:
* Long
* Invites bugs
* Incredibly poor choice of names (and I'm ignoring your faux "Hungarian")
* Insanely inefficient
* Demonstrably very buggy, given how many iterations you've had.
So far, I'm only aware of one bug in the version I did in ten minutes; you
put hours into this, and more time into debugging it, and while I *think* it
probably works, I sure wouldn't want to bet on it. Were this an actual code
review, this would get a "NAK, please clean up and resubmit".
-s