Challenge: tightest code to find-replace a string

Discussion in 'C Programming' started by DFS, Jun 6, 2014.

  1. DFS

    DFS Guest

    * reads an existing file
    * writes changes to new file
    * counts replacements made by line
    * counts total replacements made
    * no fancy usage of sed!

    I KNOW someone can better my piddly effort below (actually one I found
    online and made mods to):
    =================================================================================
    #include <stdio.h>
    #include <string.h>

    int findreplace(void)
    {
    int bufferSize = 0x1000;
    int i = 0, k = 0, j = 0;
    char buffer[bufferSize];
    FILE *inFile = fopen("random_in.txt", "rt");
    FILE *outFile = fopen("random_out.txt", "w+");
    char *find = "46";
    char *replace = "----";

    if(inFile == NULL || outFile == NULL)
    {
    printf("Error opening file(s)");
    return 1;
    }

    printf("Replace '%s' with '%s':\n", find, replace);

    while(fgets(buffer, bufferSize, inFile) != NULL)
    {
    char *stop = NULL;
    char *start = buffer;
    k = 0;

    while(1)
    {
    stop = strstr(start, find);

    if(stop == NULL)
    {
    fwrite(start, 1, strlen(start), outFile);
    break;
    } else {
    fwrite(start, 1, stop - start, outFile);
    fwrite(replace, 1, strlen(replace), outFile);
    start = stop + strlen(find);
    k++;
    }
    }

    i++;
    j += k;
    printf("Line %d: %d replacements made\n", i, k);
    }
    printf("%d replacements made.\n", j);

    fclose(inFile);
    fclose(outFile);

    return 0;
    }


    int main(void) {
    findreplace();
    return 0;
    }

    =================================================================================

    input (random_in.txt)

    14513111664214260256543011122553234523520226455552
    41602561064325541006060354620223361346535061545034
    63164621623130051346620535103421535300201464252314
    30013144611120401561305220534605456101542562311260
    30501506124251042546364005110661421500320026101445
    35355334213621124600100142264440253516210400362562
    65140560414014522562466550406113020500531011441421
    60543325410345553336424511333322104440166124450061
    44310321435636412163052026304311532342515351020026
    10536502643531635353214012163164121056142415600245

    output (random_out.txt)

    14513111664214260256543011122553234523520226455552
    4160256106432554100606035----202233613----535061545034
    6316----216231300513----620535103421535300201----4252314
    3001314----1112040156130522053----05456101542562311260
    305015061242510425----364005110661421500320026101445
    3535533421362112----00100142264440253516210400362562
    65140560414014522562----6550406113020500531011441421
    60543325410345553336424511333322104440166124450061
    44310321435636412163052026304311532342515351020026
    10536502643531635353214012163164121056142415600245

    =================================================================================


    [dfs@home files]$ ./find_replace
    Replace '46' with '----':
    Line 1: 0 replacements made
    Line 2: 2 replacements made
    Line 3: 3 replacements made
    Line 4: 2 replacements made
    Line 5: 1 replacements made
    Line 6: 1 replacements made
    Line 7: 1 replacements made
    Line 8: 0 replacements made
    Line 9: 0 replacements made
    Line 10: 0 replacements made
    10 replacements made.

    =================================================================================
     
    DFS, Jun 6, 2014
    #1
    1. Advertisements

  2. DFS

    Stefan Ram Guest

    What you wrote does not replace strings that contain
    line breaks or occur at 0x1000 boundaries.
     
    Stefan Ram, Jun 6, 2014
    #2
    1. Advertisements

  3. DFS

    Ike Naar Guest

    This could be simplified to

    while (stop = strstr(start, find), stop != NULL)
    {
    fwrite(start, 1, stop - start, outFile);
    fputs(replace, outFile);
    start = stop + strlen(find);
    k++;
    }
    fputs(start, outFile);
     
    Ike Naar, Jun 6, 2014
    #3
  4. DFS

    Noob Guest

    This doesn't "feel" very idiomatic.

    Perhaps

    while ((stop = strstr(start, find)) != NULL)

    or even

    while (stop = strstr(start, find))

    The second one raises warnings with most compilers.

    while ((stop = strstr(start, find)))

    may shut them up.
     
    Noob, Jun 6, 2014
    #4
  5. DFS

    DFS Guest

    OK.

    But the challenge isn't to say what it can't do. It's to show a tighter
    piece of code that does it as well or better.

    Looking forward to your entry!
     
    DFS, Jun 6, 2014
    #5
  6. DFS

    Jorgen Grahn Guest

    But to do that you need to understand what the program is supposed to
    accomplish.

    And by the way, I don't understand what "tight" means. I'd personally
    optimize for memory and I/O use.

    /Jorgen
     
    Jorgen Grahn, Jun 6, 2014
    #6
  7. But it does disqualify your entry as it doesn't accomplish the stated
    goal. Looking forward to your fix!
     
    Mark Storkamp, Jun 6, 2014
    #7
  8. It reports rather than counts these matches. I would never write a
    function with this spec. because it destroys its usefulness in other
    contexts. A function should do one thing well.

    I'd write a string match/replace function that returns the number of
    matches. If I needed the counts reported by line, I'd write a wrapper
    that adds those.

    int replace_string(const char *match, const char *repl, int stopper,
    FILE *fi, FILE *fo)
    {
    int nmatches = 0, c;
    const char *mp = match;
    while ((c = fgetc(fi)) != EOF && c != stopper)
    if (c == *mp) {
    if (!*++mp) {
    ++nmatches;
    fputs(repl, fo);
    }
    }
    else {
    mp = match;
    fputc(c, fo);
    }
    return nmatches;
    }

    Called with stopper == EOF it processes a whole file. Note how removing
    the line buffer actually simplifies the code, whilst also removing an
    unnecessary restriction. It's not uncommon for this to happen (there
    was a recent thread about this).

    Called with stopper == '\n' it processes a line and so this wrapper
    prints the report:

    void replace_string_report(const char *match, const char *repl,
    FILE *fi, FILE *fo)
    {
    int total_matches = 0, lineno = 0;
    while (!feof(fi)) {
    int nm = replace_string(match, repl, '\n', fi, fo);
    printf("\nLine %d: %d replacements\n", ++lineno, nm);
    total_matches += nm;
    }
    printf("%d replacements\n", total_matches);
    }

    Here's the driver for testing.

    int main(int argc, char **argv)
    {
    if (argc > 2) {
    FILE *fin = argc > 3 ? fopen(argv[3], "r") : stdin;
    FILE *fout = argc > 4 ? fopen(argv[4], "w") : stdout;
    if (fin && fout)
    replace_string_report(argv[1], argv[2], fin, fout);
    }
    }

    Functions that mix tasks that can be logically separated are best
    avoided. Functions with hard-wired file names and strings are, well,
    let's just say, sub-optimal. Students used to say "but it's because I'm
    just testing" but a simple driver like the one above makes testing
    much easier than having the files and strings hard wired.

    <snip>
     
    Ben Bacarisse, Jun 6, 2014
    #8
  9. DFS

    BartC Guest

    The OP's findreplace() function where everything was hard-coded inside it,
    rather than being passed as arguments did grate a little (that would also be
    the first thing I'd change).

    But I wouldn't bother with command line parameters for testing until it's
    finished. Far easier to just write:

    int main(void) {
    replace_string_report("46","----", "random_in.txt", "random_out.txt");
    }

    (Although you'd have to decide whether file names or handles are going to be
    passed. If this is the only find&replace operation on the file, then file
    names are probably more appropriate, although it will need more
    error-checking inside the function.)
     
    BartC, Jun 6, 2014
    #9
  10. /* bug here? fwrite(match, mp-match, 1, of); */
    I think there's a bug in this. Fix untested.
     
    Malcolm McLean, Jun 6, 2014
    #10
  11. DFS

    Jorgen Grahn Guest

    .
    The easiest and most useful is to default to stdin and stdout, just
    like sed(1) does. The second most useful is to emulate Perl's <>
    operator (stdin, or a sequence of named files, including "-" which
    means stdin).

    /Jorgen
     
    Jorgen Grahn, Jun 6, 2014
    #11
  12. DFS

    Stefan Ram Guest

    I think this is not a sufficient specification of requirements.

    For example, it mentions »changes« in line two, but before line
    two, it was not said that anything should be changed at all. So
    it is not clear what »changes« refers to.

    And »to count« something is not behavior that is visible from the
    outside.

    BTW: When given the task

    »replace all the occurences of "abcabc" by "defdef" in
    "012abcabcabc789" would
    "012defdefdef789" be a correct result? the only correct result?«
     
    Stefan Ram, Jun 6, 2014
    #12
  13. Actually, this fulfills the requirement pretty much in all cases. All
    cases in which the search string has no occurences, that is.
    Here's my take at it:

    int findreplace(int searchstring) {
    if (searchstring < 2) {
    return 0;
    } else if (searchstring == 2) {
    return 1;
    } else {
    for (int i = 2; i < searchstring; i++) {
    if ((searchstring % i) == 0) {
    return 0;
    }
    }
    return 1;
    }
    }

    Cheers,
    Johannes

    --
    Ah, der neueste und bis heute genialste Streich unsere großen
    Kosmologen: Die Geheim-Vorhersage.
    - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$>
     
    Johannes Bauer, Jun 6, 2014
    #13
  14. Yes, thanks. The unmatched portion needs to be printed on failure.
     
    Ben Bacarisse, Jun 6, 2014
    #14
  15. For a few lines of code you get much greater flexibility in testing.
    Maybe your environment does not make command-line programs easy to run?
    Why would you ever use file names? It's inherently a stream operation,
    so limiting it to named files just makes it clunky, in my view.
     
    Ben Bacarisse, Jun 6, 2014
    #15
  16. DFS

    BartC Guest

    I don't understand streams. I like things to have a beginning and an end,
    and a whole file is a well-understood chunk of data to work on, if it's not
    possible to just work on strings (which would be my approach; then it would
    be independent from files *and* streams).

    Imagine if you were creating some string functions where strings didn't have
    a well-defined end and could conceivably have an unlimited length...
     
    BartC, Jun 6, 2014
    #16
  17. A stream can have an end. And not all named files do. I don't think
    this is useful distinction.

    Anyway, if you don't like streams, I see no reason to make you like
    them. I like my way and I imagine you are happy with yours.
    That does not sound like what I meant by "this is a stream operation".
    It certainly does not apply in the case being discussed.
     
    Ben Bacarisse, Jun 6, 2014
    #17
  18. DFS

    Chad Guest

    Ack, my browser refuses to include the quoted text in my reply. Anyhow, having strings hardwired into a function in some cases could possibly change and/or break a function. The one example that comes to mind are functions that add text to some kind of graphic. If the string name was hardwired in, the computer could possibly interpret that string as a single point on the plane. That could be bad since a piece of text can sometimes span across a line.
     
    Chad, Jun 7, 2014
    #18
  19. DFS

    Stefan Ram Guest

    Typically, a newsreader is used for Usenet access.
    How is the »hello, world« program written, then,
    without a »string hardwired« into the main function?
    drawText( canvas, "hello, world" )

    risks that the computer can interpret »hello, world«
    as »a single point on the plane«?
     
    Stefan Ram, Jun 7, 2014
    #19
  20. DFS

    Chad Guest

    From my limited experience, the problem comes from if you view the functionas performing some kind of action on a string. From this vantage point, the function would move the string along some line as it executes. Once the function is done, the string would stop at some point. Now if you would let s represent some string, the same thing would happen. However, s would be the entire length of the traversal.
     
    Chad, Jun 7, 2014
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.