Finding and Replacing Substrings In A String

D

DarthBob88

I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
 
M

Malcolm McLean

DarthBob88 said:
I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
You'll make life a lot easier for yourself if you can specify that the
search string cannot contain newlines.

Load each line. Call strstr() repeatedly to count the number of ocurrences
of each target string. Then calculate how much extra memory is required.

(You need to think what happens if one search string is a substring of
another, or contains an overlap)

Allocate another buffer of the right length, not forgetting the terminal
nul. Then do a search and replace. Probably the easiest way to do this is to
have two buffers, search one and replace into the other, iteratively until
you have done all the targets.
 
R

Richard Heathfield

Malcolm McLean said:

You'll make life a lot easier for yourself if you can specify that the
search string cannot contain newlines.

This is not in fact necessary. If you're prepared to shift stuff around in
memory a fair bit, all you need is a source buffer twice the size of the
needle. Search for the needle; if you find it, copy everything up to but
not including it to a temporary file, write the replacement needle to the
file, and then move all the subsequent contents of the buffer (i.e. the
stuff following the needle) to its beginning, and replenish it from the
input file. (Newlines are merely more grist to the mill.)

If you *don't* find it, write the first half of the buffer to the temporary
file, and then shift the second half into the first half and replenish
from the input.

When the input is exhausted and you're sure the buffer contains no needles,
write the remainder to the temporary file. Then remove and rename in the
canonical fashion.

Depending on just how much data you've got, it might be worth investigating
the Boyer-Moore string searching algorithm, since native strstr
implementations can be a bit dumb.

(You need to think what happens if one search string is a substring of
another, or contains an overlap)

Indeed.
 
A

Army1987

I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
Try to memmove() the remainder of the string forward, like this:
"This is a bug. \n\0"
"feature" is four characters longer than "bug", so slide the part
of the string starting with the period four characters forward,
then memcpy() "feature" where the 'b' of "bug" was. Probably there
are better ways to do that, try asking in comp.programming.

e.g.
char str[1000] = "This is a bug. \n"
char *search = "bug";
char *replace = "feature";
size_t len = strlen(str);
size_t s_len = strlen(search);
size_t r_len = strlen(replace);
char *current = str;
while (current = strstr(current, search)) /*assignment*/ {
memmove(current + r_len , current + s_len,
len - (current - str) - s_len + 1);
memcpy(current, replace, r_len);
} /*not compiled, not tested. make sure there's enough space past
*the end of the string in str. */
 
W

Willem

DarthBob88 wrote:
) I have to go through a file and replace any occurrences of a given
) string with the desired string, like replacing "bug" with "feature".
) This is made more complicated by the fact that I have to do this with
) a lot of replacements and by the fact that some of the target strings
) are two words or more long, so I can't just break up the file at
) whitespace, commas, and periods. How's the best way to do this? I've
) thought about using strstr() to find the string and strncpy() to
) replace it, but it occurs to me that it would screw up the string to
) overwrite part of it with strncpy(). How should I do this?

The Knuth-Morris-Pratt algorithm reads the charachers in the searched
string sequentially, one by one. So if you use that algo, you can quite
simply read from the file one char at a time, searching for a match.
Writing to the output should be fairly easy as well, just make sure you
only write characters when they are known to be a mismatch.

You'll have to rely on the system to make it I/O efficient.

After you've got it working, you can always optimize it by dropping in
a platform-specific I/O routine, if needed.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
F

Friedrich Dominicus

DarthBob88 said:
I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
Maybe it would be a good idea to look for a library for handling that
kind of stuff? Maybe some regular expresson libraries would come in
handy?

Regards
Friedrich
 
A

Army1987

char str[1000] = "This is a bug. \n"
char *search = "bug";
char *replace = "feature";
size_t len = strlen(str);
size_t s_len = strlen(search);
size_t r_len = strlen(replace);
char *current = str;
while (current = strstr(current, search)) /*assignment*/ {
memmove(current + r_len , current + s_len,
len - (current - str) - s_len + 1);
memcpy(current, replace, r_len);
}
Finding two bugs and correcting them is left as an exercise.
(Hint: one of them only shows up when search is a substring of
replace.)
 
K

Keith Thompson

Army1987 said:
I have to go through a file and replace any occurrences of a given
string with the desired string, like replacing "bug" with "feature".
This is made more complicated by the fact that I have to do this with
a lot of replacements and by the fact that some of the target strings
are two words or more long, so I can't just break up the file at
whitespace, commas, and periods. How's the best way to do this? I've
thought about using strstr() to find the string and strncpy() to
replace it, but it occurs to me that it would screw up the string to
overwrite part of it with strncpy(). How should I do this?
Try to memmove() the remainder of the string forward, like this:
"This is a bug. \n\0"
"feature" is four characters longer than "bug", so slide the part
of the string starting with the period four characters forward,
then memcpy() "feature" where the 'b' of "bug" was. Probably there
are better ways to do that, try asking in comp.programming.

e.g.
char str[1000] = "This is a bug. \n"
char *search = "bug";
char *replace = "feature";
size_t len = strlen(str);
size_t s_len = strlen(search);
size_t r_len = strlen(replace);
char *current = str;
while (current = strstr(current, search)) /*assignment*/ {
memmove(current + r_len , current + s_len,
len - (current - str) - s_len + 1);
memcpy(current, replace, r_len);
} /*not compiled, not tested. make sure there's enough space past
*the end of the string in str. */

You're copying the buffer (well, half of it on average) every time you
do a replacement.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top