Finding and Replacing Substrings In A String

Discussion in 'C Programming' started by DarthBob88, Sep 23, 2007.

  1. DarthBob88

    DarthBob88 Guest

    I have to go through a file and replace any occurrences of a given
    string with the desired string, like replacing "bug" with "feature".
    This is made more complicated by the fact that I have to do this with
    a lot of replacements and by the fact that some of the target strings
    are two words or more long, so I can't just break up the file at
    whitespace, commas, and periods. How's the best way to do this? I've
    thought about using strstr() to find the string and strncpy() to
    replace it, but it occurs to me that it would screw up the string to
    overwrite part of it with strncpy(). How should I do this?
     
    DarthBob88, Sep 23, 2007
    #1
    1. Advertising

  2. "DarthBob88" <> wrote in message
    news:...
    >I have to go through a file and replace any occurrences of a given
    > string with the desired string, like replacing "bug" with "feature".
    > This is made more complicated by the fact that I have to do this with
    > a lot of replacements and by the fact that some of the target strings
    > are two words or more long, so I can't just break up the file at
    > whitespace, commas, and periods. How's the best way to do this? I've
    > thought about using strstr() to find the string and strncpy() to
    > replace it, but it occurs to me that it would screw up the string to
    > overwrite part of it with strncpy(). How should I do this?
    >

    You'll make life a lot easier for yourself if you can specify that the
    search string cannot contain newlines.

    Load each line. Call strstr() repeatedly to count the number of ocurrences
    of each target string. Then calculate how much extra memory is required.

    (You need to think what happens if one search string is a substring of
    another, or contains an overlap)

    Allocate another buffer of the right length, not forgetting the terminal
    nul. Then do a search and replace. Probably the easiest way to do this is to
    have two buffers, search one and replace into the other, iteratively until
    you have done all the targets.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
     
    Malcolm McLean, Sep 23, 2007
    #2
    1. Advertising

  3. Malcolm McLean said:

    <snip>

    > You'll make life a lot easier for yourself if you can specify that the
    > search string cannot contain newlines.


    This is not in fact necessary. If you're prepared to shift stuff around in
    memory a fair bit, all you need is a source buffer twice the size of the
    needle. Search for the needle; if you find it, copy everything up to but
    not including it to a temporary file, write the replacement needle to the
    file, and then move all the subsequent contents of the buffer (i.e. the
    stuff following the needle) to its beginning, and replenish it from the
    input file. (Newlines are merely more grist to the mill.)

    If you *don't* find it, write the first half of the buffer to the temporary
    file, and then shift the second half into the first half and replenish
    from the input.

    When the input is exhausted and you're sure the buffer contains no needles,
    write the remainder to the temporary file. Then remove and rename in the
    canonical fashion.

    Depending on just how much data you've got, it might be worth investigating
    the Boyer-Moore string searching algorithm, since native strstr
    implementations can be a bit dumb.

    <snip>

    > (You need to think what happens if one search string is a substring of
    > another, or contains an overlap)


    Indeed.

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
     
    Richard Heathfield, Sep 23, 2007
    #3
  4. DarthBob88

    Army1987 Guest

    On Sun, 23 Sep 2007 06:22:23 +0000, DarthBob88 wrote:

    > I have to go through a file and replace any occurrences of a given
    > string with the desired string, like replacing "bug" with "feature".
    > This is made more complicated by the fact that I have to do this with
    > a lot of replacements and by the fact that some of the target strings
    > are two words or more long, so I can't just break up the file at
    > whitespace, commas, and periods. How's the best way to do this? I've
    > thought about using strstr() to find the string and strncpy() to
    > replace it, but it occurs to me that it would screw up the string to
    > overwrite part of it with strncpy(). How should I do this?

    Try to memmove() the remainder of the string forward, like this:
    "This is a bug. \n\0"
    "feature" is four characters longer than "bug", so slide the part
    of the string starting with the period four characters forward,
    then memcpy() "feature" where the 'b' of "bug" was. Probably there
    are better ways to do that, try asking in comp.programming.

    e.g.
    char str[1000] = "This is a bug. \n"
    char *search = "bug";
    char *replace = "feature";
    size_t len = strlen(str);
    size_t s_len = strlen(search);
    size_t r_len = strlen(replace);
    char *current = str;
    while (current = strstr(current, search)) /*assignment*/ {
    memmove(current + r_len , current + s_len,
    len - (current - str) - s_len + 1);
    memcpy(current, replace, r_len);
    } /*not compiled, not tested. make sure there's enough space past
    *the end of the string in str. */
    --
    Army1987 (Replace "NOSPAM" with "email")
    A hamburger is better than nothing.
    Nothing is better than eternal happiness.
    Therefore, a hamburger is better than eternal happiness.
     
    Army1987, Sep 23, 2007
    #4
  5. DarthBob88

    Willem Guest

    DarthBob88 wrote:
    ) I have to go through a file and replace any occurrences of a given
    ) string with the desired string, like replacing "bug" with "feature".
    ) This is made more complicated by the fact that I have to do this with
    ) a lot of replacements and by the fact that some of the target strings
    ) are two words or more long, so I can't just break up the file at
    ) whitespace, commas, and periods. How's the best way to do this? I've
    ) thought about using strstr() to find the string and strncpy() to
    ) replace it, but it occurs to me that it would screw up the string to
    ) overwrite part of it with strncpy(). How should I do this?

    The Knuth-Morris-Pratt algorithm reads the charachers in the searched
    string sequentially, one by one. So if you use that algo, you can quite
    simply read from the file one char at a time, searching for a match.
    Writing to the output should be fairly easy as well, just make sure you
    only write characters when they are known to be a mismatch.

    You'll have to rely on the system to make it I/O efficient.

    After you've got it working, you can always optimize it by dropping in
    a platform-specific I/O routine, if needed.


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
     
    Willem, Sep 23, 2007
    #5
  6. DarthBob88 <> writes:

    > I have to go through a file and replace any occurrences of a given
    > string with the desired string, like replacing "bug" with "feature".
    > This is made more complicated by the fact that I have to do this with
    > a lot of replacements and by the fact that some of the target strings
    > are two words or more long, so I can't just break up the file at
    > whitespace, commas, and periods. How's the best way to do this? I've
    > thought about using strstr() to find the string and strncpy() to
    > replace it, but it occurs to me that it would screw up the string to
    > overwrite part of it with strncpy(). How should I do this?

    Maybe it would be a good idea to look for a library for handling that
    kind of stuff? Maybe some regular expresson libraries would come in
    handy?

    Regards
    Friedrich

    --
    Please remove just-for-news- to reply via e-mail.
     
    Friedrich Dominicus, Sep 23, 2007
    #6
  7. DarthBob88

    Army1987 Guest

    On Sun, 23 Sep 2007 12:08:58 +0200, Army1987 wrote:
    > char str[1000] = "This is a bug. \n"
    > char *search = "bug";
    > char *replace = "feature";
    > size_t len = strlen(str);
    > size_t s_len = strlen(search);
    > size_t r_len = strlen(replace);
    > char *current = str;
    > while (current = strstr(current, search)) /*assignment*/ {
    > memmove(current + r_len , current + s_len,
    > len - (current - str) - s_len + 1);
    > memcpy(current, replace, r_len);
    > }

    Finding two bugs and correcting them is left as an exercise.
    (Hint: one of them only shows up when search is a substring of
    replace.)
    --
    Army1987 (Replace "NOSPAM" with "email")
    A hamburger is better than nothing.
    Nothing is better than eternal happiness.
    Therefore, a hamburger is better than eternal happiness.
     
    Army1987, Sep 23, 2007
    #7
  8. Army1987 <> writes:
    > On Sun, 23 Sep 2007 06:22:23 +0000, DarthBob88 wrote:
    >> I have to go through a file and replace any occurrences of a given
    >> string with the desired string, like replacing "bug" with "feature".
    >> This is made more complicated by the fact that I have to do this with
    >> a lot of replacements and by the fact that some of the target strings
    >> are two words or more long, so I can't just break up the file at
    >> whitespace, commas, and periods. How's the best way to do this? I've
    >> thought about using strstr() to find the string and strncpy() to
    >> replace it, but it occurs to me that it would screw up the string to
    >> overwrite part of it with strncpy(). How should I do this?

    > Try to memmove() the remainder of the string forward, like this:
    > "This is a bug. \n\0"
    > "feature" is four characters longer than "bug", so slide the part
    > of the string starting with the period four characters forward,
    > then memcpy() "feature" where the 'b' of "bug" was. Probably there
    > are better ways to do that, try asking in comp.programming.
    >
    > e.g.
    > char str[1000] = "This is a bug. \n"
    > char *search = "bug";
    > char *replace = "feature";
    > size_t len = strlen(str);
    > size_t s_len = strlen(search);
    > size_t r_len = strlen(replace);
    > char *current = str;
    > while (current = strstr(current, search)) /*assignment*/ {
    > memmove(current + r_len , current + s_len,
    > len - (current - str) - s_len + 1);
    > memcpy(current, replace, r_len);
    > } /*not compiled, not tested. make sure there's enough space past
    > *the end of the string in str. */


    You're copying the buffer (well, half of it on average) every time you
    do a replacement.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Sep 23, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Will McGugan

    Replacing large number of substrings

    Will McGugan, Sep 4, 2005, in forum: Python
    Replies:
    3
    Views:
    338
    Michael J. Fromberger
    Sep 4, 2005
  2. amadain
    Replies:
    11
    Views:
    445
    Paul McGuire
    Feb 14, 2007
  3. Tung Chau
    Replies:
    1
    Views:
    481
    SM Ryan
    Aug 6, 2004
  4. Tung Chau
    Replies:
    0
    Views:
    388
    Tung Chau
    Aug 6, 2004
  5. Karsten Wutzke
    Replies:
    3
    Views:
    406
    Jeff Higgins
    Mar 20, 2008
Loading...

Share This Page