Search and replace c-style strings in c++

Discussion in 'C++' started by Mike Semko, Apr 25, 2012.

  1. Mike Semko

    Mike Semko Guest

    I am trying to write a function that takes three c-style strings, and
    returns a c-style string. This function searches a c-string for all
    occurrences of the sub-string and replaces them with a different
    string.
    This program works but seems very inelegant. I can't help the feeling
    like it could have been done in a less bulky way.


    #include <iostream>
    #include <cstring>

    using namespace std;

    int main()
    {

    char* replaceSubstring(char*,char*,char*);

    char *string1, *string2, *string3;
    string1 = "the dog jumped over the fence";
    string2 = "the";
    string3 = "that";
    cout << replaceSubstring(string1,string2,string3);

    }


    char* replaceSubstring(char *original, char *from, char *to)
    {
    int origlen = strlen(original);
    int i = 0;
    int count = 0;
    char *ptr;
    while (i<origlen)
    {
    ptr = strstr(original+i, from);
    if (!ptr)
    break;
    else
    {
    i = ptr - original + 1;
    count++;
    }
    }
    int newsize = origlen + (strlen(to) - strlen(from)) * count;
    char *newstring = new char[newsize];
    newstring[0] = '\0';
    i = 0;
    while (i < origlen)
    {
    ptr = strstr(original+i, from);
    if (!ptr)
    {
    strcat(newstring,original+i);
    break;
    }
    else
    {
    strncat(newstring, original+i,ptr-(original+i));
    strcat(newstring, to);
    i = i + ptr - (original + i) + strlen(from);
    }
    }
    strcat(newstring,"\0");
    return newstring;
    }

    Would anyone have any suggestions on how to make this code clearer and/
    or more efficient ? Any comments are welcome. Please do not suggest to
    use class string instead. That is not an option. The function must
    work with c-strings.
     
    Mike Semko, Apr 25, 2012
    #1
    1. Advertising

  2. Mike Semko

    Ian Collins Guest

    On 04/25/12 11:06 AM, Mike Semko wrote:
    > I am trying to write a function that takes three c-style strings, and
    > returns a c-style string. This function searches a c-string for all
    > occurrences of the sub-string and replaces them with a different
    > string.
    > This program works but seems very inelegant. I can't help the feeling
    > like it could have been done in a less bulky way.
    >
    >
    > #include<iostream>
    > #include<cstring>
    >
    > using namespace std;
    >
    > int main()
    > {
    >
    > char* replaceSubstring(char*,char*,char*);
    >
    > char *string1, *string2, *string3;
    > string1 = "the dog jumped over the fence";
    > string2 = "the";
    > string3 = "that";


    These should be const char*, not char*.

    > cout<< replaceSubstring(string1,string2,string3);
    >
    > }
    >
    >
    > char* replaceSubstring(char *original, char *from, char *to)
    > {


    to and from should be const.

    <snip>
    >
    > Would anyone have any suggestions on how to make this code clearer and/
    > or more efficient ? Any comments are welcome. Please do not suggest to
    > use class string instead. That is not an option. The function must
    > work with c-strings.


    Use std::string internally and copy the result to a C style string.

    --
    Ian Collins
     
    Ian Collins, Apr 25, 2012
    #2
    1. Advertising

  3. Mike Semko

    Mike Semko Guest


    > Use std::string internally and copy the result to a C style string.


    Yes I thought about that option, but i am trying to practice using c-
    strings specifically. Assuming I can't use std::string is there any
    way to make this existing code more concise. Maybe I am overlooking
    something. The way I have to figure out the new length and dynamically
    create a string seems ugly at best.
     
    Mike Semko, Apr 25, 2012
    #3
  4. Mike Semko

    Ian Collins Guest

    On 04/25/12 11:38 AM, Mike Semko wrote:

    Please don't snip attributions, it's rude.

    > I wrote:


    >> Use std::string internally and copy the result to a C style string.

    >
    > Yes I thought about that option, but i am trying to practice using c-
    > strings specifically.


    Well unless you are learning C, you should take advantage of the
    improved tools C++ offers you. Your code demonstrates how unpleasant
    string manipulation is in C!

    > Assuming I can't use std::string is there any
    > way to make this existing code more concise. Maybe I am overlooking
    > something. The way I have to figure out the new length and dynamically
    > create a string seems ugly at best.


    You have to dynamically create the new string because your result has to
    be consistent, the caller has to know whether or not to free the result.
    To get the size of the new string, you have to do what you have done.

    You could save some time by stopping your replace loop once count
    instances have been replaced (especially if count==0!). Extracting the
    two while loops into their own functions would make the code's logic
    clearer.

    --
    Ian Collins
     
    Ian Collins, Apr 25, 2012
    #4
  5. Mike Semko

    Ian Collins Guest

    On 04/25/12 12:00 PM, Scott Lurndal wrote:
    > Mike Semko<> writes:<snip>



    >> int newsize = origlen + (strlen(to) - strlen(from)) * count;
    >> char *newstring = new char[newsize];

    >
    > Memory leak.


    No, it's not. It is the value returned by the function.

    --
    Ian Collins
     
    Ian Collins, Apr 25, 2012
    #5
  6. Mike Semko

    Mike Semko Guest

    On Apr 24, 8:08 pm, Ian Collins <> wrote:
    > On 04/25/12 11:38 AM, Mike Semko wrote:
    >
    > Please don't snip attributions, it's rude.


    I apologize, I wasn't aware that was a problem, as I have not posted
    to a newsgroup before.

    <snip>

    > You have to dynamically create the new string because your result has to
    > be consistent, the caller has to know whether or not to free the result.
    >   To get the size of the new string, you have to do what you have done.
    >
    > You could save some time by stopping your replace loop once count
    > instances have been replaced (especially if count==0!).  Extractingthe
    > two while loops into their own functions would make the code's logic
    > clearer.


    Some people have suggested to me to just create a string of a large
    size (like 2048) to use as a new string.
    Do you think that is something I should have done ?
     
    Mike Semko, Apr 25, 2012
    #6
  7. Mike Semko

    Ian Collins Guest

    On 04/25/12 12:32 PM, Mike Semko wrote:
    > On Apr 24, 8:08 pm, Ian Collins<> wrote:
    >> On 04/25/12 11:38 AM, Mike Semko wrote:

    >
    >> You have to dynamically create the new string because your result has to
    >> be consistent, the caller has to know whether or not to free the result.
    >> To get the size of the new string, you have to do what you have done.
    >>
    >> You could save some time by stopping your replace loop once count
    >> instances have been replaced (especially if count==0!). Extracting the
    >> two while loops into their own functions would make the code's logic
    >> clearer.

    >
    > Some people have suggested to me to just create a string of a large
    > size (like 2048) to use as a new string.
    > Do you think that is something I should have done ?


    Consider what happens if an equally large or larger string is passed in!

    --
    Ian Collins
     
    Ian Collins, Apr 25, 2012
    #7
  8. Mike Semko

    Mike Semko Guest

    On Apr 24, 8:35 pm, Ian Collins <> wrote:
    > On 04/25/12 12:32 PM, Mike Semko wrote:
    >
    > > On Apr 24, 8:08 pm, Ian Collins<>  wrote:
    > >> On 04/25/12 11:38 AM, Mike Semko wrote:

    >


    >
    > Consider what happens if an equally large or larger string is passed in!


    Yes of course that would cause a buffer overflow.
    Another thing I have considered is to calculate the maximum required
    size of the string like this:

    if ((strlen(to) > strlen(from))
    newlength = strlen(original) + strlen(original)/strlen(from) *
    (strlen(from) - strlen(to))

    But now this looks even uglier and more unreadable than before :(
    What do you think about this ?
     
    Mike Semko, Apr 25, 2012
    #8
  9. Mike Semko

    Miles Bader Guest

    (Scott Lurndal) writes:
    > if the input strings may be utf-8 instead of ISO-8859-1 or ASCII, then
    > you've more work.


    Hmm, why?

    Valid UTF-8 sequences are context independent, and can't be
    subsequences of other valid UTF-8 sequences, so it seems like a simple
    string-replace should work fine with them, without any special
    handling.

    [In some cases, the result may not make sense at a higher level, e.g.,
    if context-dependent unicode characters are used, but it should still
    be a valid UTF-8 string.]

    -miles

    --
    Come now, if we were really planning to harm you, would we be waiting here,
    beside the path, in the very darkest part of the forest?
     
    Miles Bader, Apr 25, 2012
    #9
  10. Scott Lurndal <> wrote:
    > Ian Collins <> writes:
    >>On 04/25/12 12:00 PM, Scott Lurndal wrote:
    >>> Mike Semko<> writes:<snip>

    >>
    >>
    >>>> int newsize = origlen + (strlen(to) - strlen(from)) * count;
    >>>> char *newstring = new char[newsize];
    >>>
    >>> Memory leak.

    >>
    >>No, it's not. It is the value returned by the function.
    >>

    >
    > Which never got free'd in the caller (granted which exited directly, but
    > that was just sample code).


    Returning unmanaged allocated memory from a function is a BAD idea and
    should always be avoided. (The only exception to this is if it's done by
    a private method of a class, which gets called only by other methods of
    said class, which take care of managing the returned allocation. Even then
    it requires care to avoid mistakes, and in practice it's rare to need to
    do this.)
     
    Juha Nieminen, Apr 26, 2012
    #10
  11. Mike Semko

    none Guest

    In article <4f98fec4$0$3095$>,
    Juha Nieminen <> wrote:
    >Scott Lurndal <> wrote:
    >> Ian Collins <> writes:
    >>>On 04/25/12 12:00 PM, Scott Lurndal wrote:
    >>>> Mike Semko<> writes:<snip>
    >>>
    >>>
    >>>>> int newsize = origlen + (strlen(to) - strlen(from)) * count;
    >>>>> char *newstring = new char[newsize];
    >>>>
    >>>> Memory leak.
    >>>
    >>>No, it's not. It is the value returned by the function.
    >>>

    >>
    >> Which never got free'd in the caller (granted which exited directly, but
    >> that was just sample code).

    >
    > Returning unmanaged allocated memory from a function is a BAD idea and
    >should always be avoided.


    Wow, that's a bit strong. Given that the OP code is essentially C,
    this is a long accepted idiom in pure C. I totally agree that this is
    error prone and that there are better ways but always be avoided is a
    bit OTT (even if the OP code is a bit OT and should really be in
    comp.language.c is he really doesn't want to use C++).

    > (The only exception to this is if it's done by
    >a private method of a class, which gets called only by other methods of
    >said class, which take care of managing the returned allocation. Even then
    >it requires care to avoid mistakes, and in practice it's rare to need to
    >do this.)
     
    none, Apr 26, 2012
    #11
  12. Yannick Tremblay <yatremblay@bel1lin202.(none)> wrote:
    >> Returning unmanaged allocated memory from a function is a BAD idea and
    >>should always be avoided.

    >
    > Wow, that's a bit strong. Given that the OP code is essentially C,
    > this is a long accepted idiom in pure C.


    That's because C offers no tools to manage such allocations. C++ does
    offer such tools and thus isn't bound to such limitations.

    Making a function return unmanaged allocated memory is only asking for
    trouble. It's way too easy to forget to delete the memory in the calling
    code. Also, it's usually not exception-safe.

    (In order for this scheme to be exception-safe, the function that's
    allocating the memory would have to either make sure no exceptions are
    thrown, or have automatic lifetime management of the allocated memory.
    Moreover, the calling code would have to immediately assign the returned
    pointer to a smart pointer or otherwise manage its lifetime immediately.
    Otherwise any exception thrown between allocation and deletion will cause
    it to leak. If the code is managing the memory anyways, then why not have
    the function return managed memory outright? Errors will be minimized
    and the code becomes simpler and safer.)
     
    Juha Nieminen, Apr 26, 2012
    #12
  13. none <yatremblay@bel1lin202.> wrote:
    > In article <4f98fec4$0$3095$>,
    > Juha Nieminen <> wrote:

    <snip>
    >> Returning unmanaged allocated memory from a function is a BAD idea and
    >> should always be avoided.

    >
    > Wow, that's a bit strong. Given that the OP code is essentially C,
    > this is a long accepted idiom in pure C. I totally agree that this is
    > error prone and that there are better ways but always be avoided is a
    > bit OTT (even if the OP code is a bit OT and should really be in
    > comp.language.c is he really doesn't want to use C++).


    I think in C it is much more common to pass a target buffer as parameter
    and returning the number of chars written to that buffer, like:

    size_t replaceSubstring(char* targetBuffer, size_t targetBufferSize, const
    char* original, const char* find, const char* replace);

    Optionally returning the requested buffer size if targetBuffer==NULL or
    targetBufferSize==0.

    Tobi
     
    Tobias Müller, Apr 26, 2012
    #13
  14. Mike Semko

    Miles Bader Guest

    Tobias Müller <> writes:
    >>> Returning unmanaged allocated memory from a function is a BAD
    >>> idea and should always be avoided.

    >>
    >> Wow, that's a bit strong. Given that the OP code is essentially C,
    >> this is a long accepted idiom in pure C. I totally agree that this
    >> is error prone and that there are better ways but always be avoided
    >> is a bit OTT (even if the OP code is a bit OT and should really be
    >> in comp.language.c is he really doesn't want to use C++).

    >
    > I think in C it is much more common to pass a target buffer as parameter
    > and returning the number of chars written to that buffer, like:


    That's the usual "ye olde" C style, but malloc'd returned buffers seem
    increasingly common in modern code...

    -miles

    --
    "Don't just question authority,
    Don't forget to question me."
    -- Jello Biafra
     
    Miles Bader, Apr 26, 2012
    #14
  15. Mike Semko

    Pavel Guest

    Mike Semko wrote:
    > I am trying to write a function that takes three c-style strings, and
    > returns a c-style string. This function searches a c-string for all
    > occurrences of the sub-string and replaces them with a different
    > string.
    > This program works but seems very inelegant. I can't help the feeling
    > like it could have been done in a less bulky way.
    >
    >
    > #include<iostream>
    > #include<cstring>
    >
    > using namespace std;
    >
    > int main()
    > {
    >
    > char* replaceSubstring(char*,char*,char*);
    >
    > char *string1, *string2, *string3;
    > string1 = "the dog jumped over the fence";
    > string2 = "the";
    > string3 = "that";
    > cout<< replaceSubstring(string1,string2,string3);
    >
    > }
    >
    >
    > char* replaceSubstring(char *original, char *from, char *to)
    > {
    > int origlen = strlen(original);
    > int i = 0;
    > int count = 0;
    > char *ptr;
    > while (i<origlen)
    > {
    > ptr = strstr(original+i, from);
    > if (!ptr)
    > break;
    > else
    > {
    > i = ptr - original + 1;
    > count++;
    > }
    > }
    > int newsize = origlen + (strlen(to) - strlen(from)) * count;
    > char *newstring = new char[newsize];
    > newstring[0] = '\0';
    > i = 0;
    > while (i< origlen)
    > {
    > ptr = strstr(original+i, from);
    > if (!ptr)
    > {
    > strcat(newstring,original+i);
    > break;
    > }
    > else
    > {
    > strncat(newstring, original+i,ptr-(original+i));
    > strcat(newstring, to);
    > i = i + ptr - (original + i) + strlen(from);
    > }
    > }
    > strcat(newstring,"\0");
    > return newstring;
    > }
    >
    > Would anyone have any suggestions on how to make this code clearer and/
    > or more efficient ? Any comments are welcome. Please do not suggest to
    > use class string instead. That is not an option. The function must
    > work with c-strings.


    I think adding 1 is a bug: consider original="aaaa", from="aa", to="bbb"; you
    need to add strlen(from). Below is my attempt to be "elegant" (although, not as
    fast as possible; these two rarely go together): Disclaimer: I did not even try
    to compile this:

    char *
    replaceSubstring(const char *original, const char *from, const char *to)
    {
    assert(original);
    assert(from);
    assert(to);
    assert(*from);
    int count = 0;
    const int fromLen = strlen(from);
    for (char *cur = *original; cur = strstr(cur, from); cur += fromLen)
    ++count;
    const int toLen = srtrlen(to);
    char *origRes = new char[strlen(original) + (toLen - fromLen) * count];
    for (char *res = origRes; cur = strstr(original, from);
    original = cur + fromLen) {
    memcpy(res, original, cur - original);
    res += (cur - original);
    memcpy(res, cur, fromLen);
    res += fromLen;
    }
    strcpy(res, original);
    return origRes;
    }

    HTH,
    Pavel
     
    Pavel, Apr 27, 2012
    #15
  16. Mike Semko

    Pavel Guest

    Pavel wrote:
    > Mike Semko wrote:
    >> I am trying to write a function that takes three c-style strings, and
    >> returns a c-style string. This function searches a c-string for all
    >> occurrences of the sub-string and replaces them with a different
    >> string.
    >> This program works but seems very inelegant. I can't help the feeling
    >> like it could have been done in a less bulky way.
    >>
    >>
    >> #include<iostream>
    >> #include<cstring>
    >>
    >> using namespace std;
    >>
    >> int main()
    >> {
    >>
    >> char* replaceSubstring(char*,char*,char*);
    >>
    >> char *string1, *string2, *string3;
    >> string1 = "the dog jumped over the fence";
    >> string2 = "the";
    >> string3 = "that";
    >> cout<< replaceSubstring(string1,string2,string3);
    >>
    >> }
    >>
    >>
    >> char* replaceSubstring(char *original, char *from, char *to)
    >> {
    >> int origlen = strlen(original);
    >> int i = 0;
    >> int count = 0;
    >> char *ptr;
    >> while (i<origlen)
    >> {
    >> ptr = strstr(original+i, from);
    >> if (!ptr)
    >> break;
    >> else
    >> {
    >> i = ptr - original + 1;
    >> count++;
    >> }
    >> }
    >> int newsize = origlen + (strlen(to) - strlen(from)) * count;
    >> char *newstring = new char[newsize];
    >> newstring[0] = '\0';
    >> i = 0;
    >> while (i< origlen)
    >> {
    >> ptr = strstr(original+i, from);
    >> if (!ptr)
    >> {
    >> strcat(newstring,original+i);
    >> break;
    >> }
    >> else
    >> {
    >> strncat(newstring, original+i,ptr-(original+i));
    >> strcat(newstring, to);
    >> i = i + ptr - (original + i) + strlen(from);
    >> }
    >> }
    >> strcat(newstring,"\0");
    >> return newstring;
    >> }
    >>
    >> Would anyone have any suggestions on how to make this code clearer and/
    >> or more efficient ? Any comments are welcome. Please do not suggest to
    >> use class string instead. That is not an option. The function must
    >> work with c-strings.

    >
    > I think adding 1 is a bug: consider original="aaaa", from="aa", to="bbb"; you
    > need to add strlen(from). Below is my attempt to be "elegant" (although, not as
    > fast as possible; these two rarely go together): Disclaimer: I did not even try
    > to compile this:
    >
    > char *
    > replaceSubstring(const char *original, const char *from, const char *to)
    > {
    > assert(original);
    > assert(from);
    > assert(to);
    > assert(*from);
    > int count = 0;
    > const int fromLen = strlen(from);
    > for (char *cur = *original; cur = strstr(cur, from); cur += fromLen)
    > ++count;
    > const int toLen = srtrlen(to);
    > char *origRes = new char[strlen(original) + (toLen - fromLen) * count];
    > for (char *res = origRes; cur = strstr(original, from);
    > original = cur + fromLen) {
    > memcpy(res, original, cur - original);
    > res += (cur - original);
    > memcpy(res, cur, fromLen);
    > res += fromLen;
    > }
    > strcpy(res, original);
    > return origRes;
    > }
    >
    > HTH,
    > Pavel

    Looking back at this code, I missed +1 at operator new call and a couple of
    `const' qualifiers on char pointers. Otherwise, looks valid.

    -Pavel
     
    Pavel, Apr 28, 2012
    #16
  17. In article <>,
    Mike Semko <> wrote:
    >I am trying to write a function that takes three c-style strings, and
    >returns a c-style string. This function searches a c-string for all
    >occurrences of the sub-string and replaces them with a different
    >string.
    >This program works but seems very inelegant. I can't help the feeling
    >like it could have been done in a less bulky way.


    Less bulky .. not very much, but you could do it more efficiently.
    Instead of doing a strstr() on every character, you could do the
    strstr() only just as many times as the "from" string occurs.
    Also you could make the replace() function have 2 modes, one in
    which it only counts the space you need, and one that actually
    does the work.

    Something like this:

    #include <cstdio>
    #include <cstring>
    #include <iostream>

    using namespace std;

    int replace(const char *orig, const char *from, const char *to, char *result)
    {
    int resindex = 0;
    int tolen = strlen(to);
    int fromlen = strlen(from);
    const char *last = orig;
    const char *p;
    while ((p = strstr(last, from)) != NULL) {
    if (result) {
    memcpy(result + resindex, last, p - last);
    memcpy(result + resindex + (p - last), to, tolen);
    }
    resindex += p - last + tolen;
    p += fromlen;
    last = p;
    }
    if (result)
    strcpy(result + resindex, last);
    resindex += strlen(last);

    return resindex;
    }

    int main(int argc, char *argv[])
    {
    const char *string1 = "the dog jumped over the fence";
    const char *string2 = "the";
    const char *string3 = "that";

    int sz = replace(string1, string2, string3, NULL);
    char *res = new char[sz + 1];
    replace(string1, string2, string3, res);
    cout << res << endl;

    return 0;
    }

    Mike.
     
    Miquel van Smoorenburg, May 6, 2012
    #17
  18. Mike Semko

    Old Wolf Guest

    On Apr 25, 12:45 pm, Mike Semko <> wrote:
    > But now this looks even uglier and more unreadable than before :(
    > What do you think about this ?


    Manipulating C-style strings is indeed ugly and
    hard to read, and error-prone; you are stuck with
    that if you insist on working with them. There's
    no elegant solution.

    Also, as others have pointed out but you haven't
    acknowledged, your program leaks memory when
    you call 'new' but do not later delete the memory.
    You could fix this by having the calling function
    store the returned value, and then deleting it after use.
     
    Old Wolf, May 7, 2012
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kurt Krueckeberg
    Replies:
    2
    Views:
    732
    =?ISO-8859-1?Q?Ney_Andr=E9_de_Mello_Zunino?=
    Nov 17, 2004
  2. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    814
    Malcolm
    Jun 24, 2006
  3. Kza
    Replies:
    4
    Views:
    441
    Andrew Koenig
    Mar 3, 2006
  4. anonym
    Replies:
    1
    Views:
    1,076
    Knute Johnson
    Jan 15, 2009
  5. IJALAB
    Replies:
    1
    Views:
    108
    Lars Haugseth
    Nov 21, 2006
Loading...

Share This Page