Efficency and the standard library

Discussion in 'C Programming' started by Seebs, Feb 10, 2010.

  1. Seebs

    Seebs Guest

    This may have gotten buried given that it started in a Nilges thread,
    but it's actually pretty interesting.

    There was some discussion of algorithms to perform the task:
    Inputs: target string, replacement string, original string
    Output: a newly-allocated string containing the original
    string with each copy of the target string replaced with the
    replacement string.

    Here's a hunk of mine:

    for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
    ++count;
    t += inlen;
    }

    Here's a hunk of another one:

    > ptrIndex1 = strMaster;
    > while(*ptrIndex1)
    > {
    > ptrIndex0 = ptrIndex1;
    > while (-1)
    > {
    > for(;
    > *ptrIndex1 && *ptrIndex1 != *strTarget;
    > ptrIndex1++);
    > for(ptrIndex2 = strTarget, ptrIndex3 = ptrIndex1;
    > *ptrIndex3
    > &&
    > *ptrIndex2
    > &&
    > *ptrIndex3 == *ptrIndex2;
    > ptrIndex3++, ptrIndex2++);
    > if (!*ptrIndex2) break;
    > ptrIndex1 = ptrIndex3;
    > if (!*ptrIndex3) break;
    > }


    These hunks are essentially performing the same part of the core loop;
    find the next occurrence of the target string in the original string.

    The second one was written by Edward Nilges ("spinoza1111"). He offers
    in its defense the assertion that it's the "best" algorithm. (Nevermind
    that it's got a bug; the bug could be fixed easily.)

    My thinking:

    The first is better, not because it's less buggy (that could be fixed), but
    because it's simpler to understand. It may, or may not, be less efficient.
    However, efficiency will vary widely with input characteristics; for some
    cases, the arguable inefficiencies may be completely irrelevant, while in
    others, they'd be significant.

    But you should always write the simpler one first, and wait until you've
    profiled the code and verified that there's a bottleneck, and you know where
    it is, before trying to optimize. You may, or may not, be able to outperform
    a given library's implementation of strstr(). If you have unusual inputs,
    you have a pretty good shot at it... But it may not be worth it. The
    complicated example has had at least one bug identified, and there may be
    others. It's extremely hard to read, partially because of the poor
    naming, and partially because of the counter-idiomatic usage. But mostly,
    it's a lot more work to write and maintain, and for no known benefit.

    It's good to think a little about efficiency, but write for clarity and
    correctness first, so you have a known-working program to check your
    results if you later need to modify it for speed.

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
     
    Seebs, Feb 10, 2010
    #1
    1. Advertising

  2. Seebs

    Moi Guest

    On Wed, 10 Feb 2010 19:25:54 +0000, Seebs wrote:

    > This may have gotten buried given that it started in a Nilges thread,
    > but it's actually pretty interesting.
    >
    > There was some discussion of algorithms to perform the task:
    > Inputs: target string, replacement string, original string Output: a
    > newly-allocated string containing the original
    > string with each copy of the target string replaced with the
    > replacement string.
    >
    > Here's a hunk of mine:
    >
    > for (count = 0, t = strstr(target, in); t && *t; t = strstr(t,
    > in)) {
    > ++count;
    > t += inlen;
    > }
    >


    Of course yours is more readable.
    Personally, I prefer not to use one letter variable names (three letter names are
    more grepable, also for the eyes). Also I dont like for(;;) statements that are too wide
    (and with repeating subexpressions). But it's all a matter of taste, of course.

    I'd prefer:


    for (count = 0, tmp=target; tmp = strstr(tmp, in); ) {
    if ( !*tmp) break;
    count++;
    tmp += inlen;
    ...
    }


    My own strstr() loop:

    nhit = 0;
    for (tail = haystack; src = strstr(tail, needle); tail = src + needle_len) {
    nhit++;
    ...
    }

    , which uses one variable too many, but that will be used later on.

    AvK
     
    Moi, Feb 10, 2010
    #2
    1. Advertising

  3. Seebs

    Dann Corbit Guest

    In article <>, usenet-
    says...
    >
    > This may have gotten buried given that it started in a Nilges thread,
    > but it's actually pretty interesting.
    >
    > There was some discussion of algorithms to perform the task:
    > Inputs: target string, replacement string, original string
    > Output: a newly-allocated string containing the original
    > string with each copy of the target string replaced with the
    > replacement string.
    >
    > Here's a hunk of mine:
    >
    > for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
    > ++count;
    > t += inlen;
    > }
    >
    > Here's a hunk of another one:
    >
    > > ptrIndex1 = strMaster;
    > > while(*ptrIndex1)
    > > {
    > > ptrIndex0 = ptrIndex1;
    > > while (-1)
    > > {
    > > for(;
    > > *ptrIndex1 && *ptrIndex1 != *strTarget;
    > > ptrIndex1++);
    > > for(ptrIndex2 = strTarget, ptrIndex3 = ptrIndex1;
    > > *ptrIndex3
    > > &&
    > > *ptrIndex2
    > > &&
    > > *ptrIndex3 == *ptrIndex2;
    > > ptrIndex3++, ptrIndex2++);
    > > if (!*ptrIndex2) break;
    > > ptrIndex1 = ptrIndex3;
    > > if (!*ptrIndex3) break;
    > > }

    >
    > These hunks are essentially performing the same part of the core loop;
    > find the next occurrence of the target string in the original string.
    >
    > The second one was written by Edward Nilges ("spinoza1111"). He offers
    > in its defense the assertion that it's the "best" algorithm. (Nevermind
    > that it's got a bug; the bug could be fixed easily.)
    >
    > My thinking:
    >
    > The first is better, not because it's less buggy (that could be fixed), but


    For a standard library, the *most* important thing is correctness. IOW,
    a bsearch() function that is blazing fast but fails if there are more
    than 2 billion items is not as good as one that is only half as fast but
    works with anything up to size_t (-1) number of elements.

    Assuming correctness, then effeciency should be the next decider. An
    interesting project would be to take a collection of all public C
    libraries and do the following:
    1. Exhaustively test for correctness
    2. Analyze for efficiency
    3. Choose best of breed from the above analysis. It could also be a
    hybrid (e.g. a decider function chooses the best algorithm based on
    conditions).

    Here is a list of some compilers (most of these come with source code)
    that could be used as a starting point:

    Directory of c:\compiler
    06/03/2009 11:38 AM <DIR> bscc
    01/21/2010 04:16 PM <DIR> clang-2.6
    01/21/2010 04:35 PM <DIR> commoncpp2-1.7.3
    01/13/2010 01:13 PM <DIR> coreutils-8.4
    10/16/2009 05:39 PM <DIR> ctool_2.12
    10/16/2009 12:39 PM <DIR> fog
    06/04/2009 11:52 AM <DIR> g95-0.92
    06/04/2009 11:58 AM <DIR> gcc-4.4.0
    10/16/2009 05:58 PM <DIR> gcc-4.4.2
    01/21/2010 04:39 PM <DIR> gcc-4.4.3
    10/16/2009 05:59 PM <DIR> ladsoft
    10/16/2009 05:59 PM <DIR> lcc
    10/16/2009 06:00 PM <DIR> llvm
    01/21/2010 04:18 PM <DIR> llvm-2.6
    01/21/2010 04:19 PM <DIR> llvm-gcc4.2-2.6.source
    11/30/2008 08:00 AM <DIR> open64-4.2.1-0
    10/16/2009 06:06 PM <DIR> OW18src
    01/21/2010 04:18 PM <DIR> owdaily
    06/04/2009 11:38 AM <DIR> tendra
    01/21/2010 04:35 PM <DIR> ucommon-2.0.8
    06/03/2009 02:49 PM <DIR> vmkit-0.25
    06/03/2009 12:24 PM <DIR> watcom
    01/21/2010 04:00 PM <DIR> x86_open64-4.2.3
     
    Dann Corbit, Feb 10, 2010
    #3
  4. Seebs

    Ben Pfaff Guest

    Dann Corbit <> writes:

    > For a standard library, the *most* important thing is correctness. IOW,
    > a bsearch() function that is blazing fast but fails if there are more
    > than 2 billion items is not as good as one that is only half as fast but
    > works with anything up to size_t (-1) number of elements.


    If the former bsearch() implementation is on a system that
    doesn't support more than 2 billion bytes of contiguous data, I'm
    not worried about it.
    --
    "The expression isn't unclear *at all* and only an expert could actually
    have doubts about it"
    --Dan Pop
     
    Ben Pfaff, Feb 10, 2010
    #4
  5. Ben Pfaff wrote:
    > Dann Corbit <> writes:
    >
    >> For a standard library, the *most* important thing is correctness. IOW,
    >> a bsearch() function that is blazing fast but fails if there are more
    >> than 2 billion items is not as good as one that is only half as fast but
    >> works with anything up to size_t (-1) number of elements.

    >
    > If the former bsearch() implementation is on a system that
    > doesn't support more than 2 billion bytes of contiguous data, I'm
    > not worried about it.


    Why?
    --
     
    Phred Phungus, Feb 11, 2010
    #5
  6. Seebs

    Ben Pfaff Guest

    Phred Phungus <> writes:

    > Ben Pfaff wrote:
    >> Dann Corbit <> writes:
    >>
    >>> For a standard library, the *most* important thing is correctness.
    >>> IOW, a bsearch() function that is blazing fast but fails if there
    >>> are more than 2 billion items is not as good as one that is only
    >>> half as fast but works with anything up to size_t (-1) number of
    >>> elements.

    >>
    >> If the former bsearch() implementation is on a system that
    >> doesn't support more than 2 billion bytes of contiguous data, I'm
    >> not worried about it.

    >
    > Why?


    Because the bsearch() function cannot exceed a limit of 2 billion
    items on an implementation that limits objects to 2 billion bytes
    or less.
    --
    char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
    ={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa67f6aaa,0xaa9aa9f6,0x11f6},*p
    =b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
    2:{i++;if(i)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}
     
    Ben Pfaff, Feb 11, 2010
    #6
  7. Seebs <> writes:

    > This may have gotten buried given that it started in a Nilges thread,
    > but it's actually pretty interesting.
    >
    > There was some discussion of algorithms to perform the task:
    > Inputs: target string, replacement string, original string
    > Output: a newly-allocated string containing the original
    > string with each copy of the target string replaced with the
    > replacement string.
    >
    > Here's a hunk of mine:
    >
    > for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
    > ++count;
    > t += inlen;
    > }
    >
    > Here's a hunk of another one:
    >
    >> ptrIndex1 = strMaster;
    >> while(*ptrIndex1)
    >> {
    >> ptrIndex0 = ptrIndex1;
    >> while (-1)
    >> {
    >> for(;
    >> *ptrIndex1 && *ptrIndex1 != *strTarget;
    >> ptrIndex1++);
    >> for(ptrIndex2 = strTarget, ptrIndex3 = ptrIndex1;
    >> *ptrIndex3
    >> &&
    >> *ptrIndex2
    >> &&
    >> *ptrIndex3 == *ptrIndex2;
    >> ptrIndex3++, ptrIndex2++);
    >> if (!*ptrIndex2) break;
    >> ptrIndex1 = ptrIndex3;
    >> if (!*ptrIndex3) break;
    >> }

    >
    > These hunks are essentially performing the same part of the core loop;
    > find the next occurrence of the target string in the original string.
    >
    > The second one was written by Edward Nilges ("spinoza1111"). He offers
    > in its defense the assertion that it's the "best" algorithm. (Nevermind
    > that it's got a bug; the bug could be fixed easily.)
    >
    > My thinking:
    >
    > The first is better, not because it's less buggy (that could be fixed), but
    > because it's simpler to understand. It may, or may not, be less efficient.
    > However, efficiency will vary widely with input characteristics; for some
    > cases, the arguable inefficiencies may be completely irrelevant, while in
    > others, they'd be significant.


    I don't think you can dismiss the bugs so easily. I would argue that
    it is not a coincidence that the more complex code has more bugs. I
    can't write code that looks like spinoza1111's code and get it right.
    I *have* to write simpler code, broken into simple functions, or I have
    no chance of getting it to be correct.

    For example, the latest improvement introduced another bug[1] and I
    don't think that is simple carelessness. Without a re-write I suspect
    almost any change is as likely to introduce a bug as fix one (I tried
    to see where the error was, but I gave up). If I was paid to fix it,
    I'd simply start again and it would end a up as I have already posted,
    though to mirror the behaviour I'd now have to add some argument
    checking.

    <snip>

    [1] Apparently when the target string is not present in the source
    string: e.g. replace("a", "x", "b").

    --
    Ben.
     
    Ben Bacarisse, Feb 11, 2010
    #7
  8. Seebs

    Dann Corbit Guest

    In article <>,
    says...
    >
    > Phred Phungus <> writes:
    >
    > > Ben Pfaff wrote:
    > >> Dann Corbit <> writes:
    > >>
    > >>> For a standard library, the *most* important thing is correctness.
    > >>> IOW, a bsearch() function that is blazing fast but fails if there
    > >>> are more than 2 billion items is not as good as one that is only
    > >>> half as fast but works with anything up to size_t (-1) number of
    > >>> elements.
    > >>
    > >> If the former bsearch() implementation is on a system that
    > >> doesn't support more than 2 billion bytes of contiguous data, I'm
    > >> not worried about it.

    > >
    > > Why?

    >
    > Because the bsearch() function cannot exceed a limit of 2 billion
    > items on an implementation that limits objects to 2 billion bytes
    > or less.



    If we are talking about constructing a best of breed library system for
    the C language, then the distinction is of enormous importance.

    While I agree it is not relevant for the original implementation, it is
    for an implementation where we are trying to pick and choose the best
    possible routines.

    P.S.
    The former case is still a bug waiting to happen.

    Even if the former implementation is a toaster IC, 40 years in the
    future, our toaster ICs will have 50 GB RAM on board.
     
    Dann Corbit, Feb 11, 2010
    #8
  9. Seebs

    Seebs Guest

    On 2010-02-11, Ben Bacarisse <> wrote:
    > I don't think you can dismiss the bugs so easily. I would argue that
    > it is not a coincidence that the more complex code has more bugs. I
    > can't write code that looks like spinoza1111's code and get it right.
    > I *have* to write simpler code, broken into simple functions, or I have
    > no chance of getting it to be correct.


    You may be right on this. Certainly, this kind of thing is one of the
    major reasons that I sometimes rewrite things a few times until I am
    sure I can understand the logic just by looking at it. If I have to
    think it through, usually that means something is wrong, or the problem
    is extremely hard.

    > For example, the latest improvement introduced another bug[1] and I
    > don't think that is simple carelessness. Without a re-write I suspect
    > almost any change is as likely to introduce a bug as fix one (I tried
    > to see where the error was, but I gave up). If I was paid to fix it,
    > I'd simply start again and it would end a up as I have already posted,
    > though to mirror the behaviour I'd now have to add some argument
    > checking.


    I think it would be interesting to start from his code and see whether it
    can be fixed by cleanups and refactoring. I bet it could, although it'd
    certainly take longer than just writing one from scratch.

    But, e.g., starting by using names a little better than "ptrIndex1" and
    the like would make it a ton simpler.

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
     
    Seebs, Feb 11, 2010
    #9
  10. Seebs <> wrote:
    > There was some discussion of algorithms to perform the task:
    >   Inputs:  target string, replacement string, original string
    >   Output:  a newly-allocated string containing the original
    >         string with each copy of the target string replaced
    > with the replacement string.
    >
    > Here's a hunk of mine:
    >
    >   for (count = 0, t = strstr(target, in); t && *t;
    > t = strstr(t, in)) {


    Why are you seemingly scanning the target string for occurances
    of 'in', which I would presume to be original 'input' string?

    >                 ++count;
    >                 t += inlen;
    >         }


    What if the string being scanned for is empty? If the string
    to be scanned is non-empty, this will produce an infinite loop.

    --
    Peter
     
    Peter Nilsson, Feb 11, 2010
    #10
  11. Seebs

    Seebs Guest

    On 2010-02-11, Peter Nilsson <> wrote:
    >>   for (count = 0, t = strstr(target, in); t && *t;
    >> t = strstr(t, in)) {


    > Why are you seemingly scanning the target string for occurances
    > of 'in', which I would presume to be original 'input' string?


    "in", "out", "original".

    > What if the string being scanned for is empty? If the string
    > to be scanned is non-empty, this will produce an infinite loop.


    This was indeed observed, which is why the error checking at the
    beginning of the function is now modified to reject a 0-length
    thing-to-replace.

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
     
    Seebs, Feb 11, 2010
    #11
  12. Seebs

    Eric Sosman Guest

    On 2/10/2010 9:30 PM, Seebs wrote:
    > On 2010-02-11, Peter Nilsson<> wrote:
    >>> for (count = 0, t = strstr(target, in); t&& *t;
    >>> t = strstr(t, in)) {

    >
    >> Why are you seemingly scanning the target string for occurances
    >> of 'in', which I would presume to be original 'input' string?

    >
    > "in", "out", "original".
    >
    >> What if the string being scanned for is empty? If the string
    >> to be scanned is non-empty, this will produce an infinite loop.

    >
    > This was indeed observed, which is why the error checking at the
    > beginning of the function is now modified to reject a 0-length
    > thing-to-replace.


    ... which is what I imagined the `*t' test in your fragment
    was supposed to catch. If zero-length targets are rejected
    earlier, I'm somehow missing the purpose of the `*t'.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Feb 11, 2010
    #12
  13. Dann Corbit wrote:
    > In article <>,
    > says...
    >> Phred Phungus <> writes:
    >>
    >>> Ben Pfaff wrote:
    >>>> Dann Corbit <> writes:
    >>>>
    >>>>> For a standard library, the *most* important thing is correctness.
    >>>>> IOW, a bsearch() function that is blazing fast but fails if there
    >>>>> are more than 2 billion items is not as good as one that is only
    >>>>> half as fast but works with anything up to size_t (-1) number of
    >>>>> elements.
    >>>> If the former bsearch() implementation is on a system that
    >>>> doesn't support more than 2 billion bytes of contiguous data, I'm
    >>>> not worried about it.
    >>> Why?

    >> Because the bsearch() function cannot exceed a limit of 2 billion
    >> items on an implementation that limits objects to 2 billion bytes
    >> or less.

    >
    >
    > If we are talking about constructing a best of breed library system for
    > the C language, then the distinction is of enormous importance.
    >
    > While I agree it is not relevant for the original implementation, it is
    > for an implementation where we are trying to pick and choose the best
    > possible routines.
    >
    > P.S.
    > The former case is still a bug waiting to happen.
    >
    > Even if the former implementation is a toaster IC, 40 years in the
    > future, our toaster ICs will have 50 GB RAM on board.


    That's a lot of computer power for something that requires almost none.
    --
    fred
     
    Phred Phungus, Feb 11, 2010
    #13
  14. Seebs

    Mark Guest

    Richard wrote:
    > Total nonsense.
    >
    > The better one is the one that runs more efficiently and works. So
    > assuming the bug ix fixed and its more efficient then other is
    > better. Its called a function. You dont need to know whats in it.


    The one who will maintain this and other functions, needs to know what's in
    it.

    --
    Mark
     
    Mark, Feb 11, 2010
    #14
  15. On 11 Feb, 01:05, Ben Bacarisse <> wrote:
    > Seebs <> writes:


    > > This may have gotten buried given that it started in a Nilges thread,
    > > but it's actually pretty interesting.

    >
    > > There was some discussion of algorithms to perform the task:
    > >    Inputs:  target string, replacement string, original string
    > >    Output:  a newly-allocated string containing the original
    > >      string with each copy of the target string replaced with the
    > >      replacement string.

    >
    > > Here's a hunk of mine:

    >
    > >         for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
    > >                 ++count;
    > >                 t += inlen;
    > >         }

    >
    > > Here's a hunk of another one:

    >
    > >>     ptrIndex1 = strMaster;
    > >>     while(*ptrIndex1)
    > >>     {
    > >>         ptrIndex0 = ptrIndex1;
    > >>         while (-1)
    > >>         {
    > >>             for(;
    > >>                 *ptrIndex1 && *ptrIndex1 != *strTarget;
    > >>                 ptrIndex1++);
    > >>             for(ptrIndex2 = strTarget, ptrIndex3 = ptrIndex1;
    > >>                 *ptrIndex3
    > >>                 &&
    > >>                 *ptrIndex2
    > >>                 &&
    > >>                 *ptrIndex3 == *ptrIndex2;
    > >>                 ptrIndex3++, ptrIndex2++);
    > >>             if (!*ptrIndex2) break;
    > >>             ptrIndex1 = ptrIndex3;
    > >>             if (!*ptrIndex3) break;
    > >>         }

    >
    > > These hunks are essentially performing the same part of the core loop;
    > > find the next occurrence of the target string in the original string.

    >
    > > The second one was written by Edward Nilges ("spinoza1111").  He offers
    > > in its defense the assertion that it's the "best" algorithm.  (Nevermind
    > > that it's got a bug; the bug could be fixed easily.)

    >
    > > My thinking:

    >
    > > The first is better, not because it's less buggy (that could be fixed), but
    > > because it's simpler to understand.  It may, or may not, be less efficient.
    > > However, efficiency will vary widely with input characteristics; for some
    > > cases, the arguable inefficiencies may be completely irrelevant, while in
    > > others, they'd be significant.

    >
    > I don't think you can dismiss the bugs so easily.  I would argue that
    > it is not a coincidence that the more complex code has more bugs.  I
    > can't write code that looks like spinoza1111's code and get it right.
    > I *have* to write simpler code, broken into simple functions, or I have
    > no chance of getting it to be correct.


    "I now suggest that we confine ourselves to the design and
    implementation of intellectually manageable programs." Dijkstra


    > For example, the latest improvement introduced another bug[1] and I
    > don't think that is simple carelessness.  Without a re-write I suspect
    > almost any change is as likely to introduce a bug as fix one (I tried
    > to see where the error was, but I gave up).  If I was paid to fix it,
    > I'd simply start again and it would end a up as I have already posted,
    > though to mirror the behaviour I'd now have to add some argument
    > checking.
    >
    > <snip>
    >
    > [1] Apparently when the target string is not present in the source
    > string: e.g. replace("a", "x", "b").
     
    Nick Keighley, Feb 11, 2010
    #15
  16. On 10 Feb, 19:25, Seebs <> wrote:

    > This may have gotten buried given that it started in a Nilges thread,
    > but it's actually pretty interesting.
    >
    > There was some discussion of algorithms to perform the task:
    >         Inputs:  target string, replacement string, original string
    >         Output:  a newly-allocated string containing the original
    >           string with each copy of the target string replaced with the
    >           replacement string.
    >
    > Here's a hunk of mine:
    >
    >         for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
    >                 ++count;
    >                 t += inlen;
    >         }
    >
    > Here's a hunk of another one:
    >
    > >     ptrIndex1 = strMaster;
    > >     while(*ptrIndex1)
    > >     {
    > >         ptrIndex0 = ptrIndex1;
    > >         while (-1)
    > >         {
    > >             for(;
    > >                 *ptrIndex1 && *ptrIndex1 != *strTarget;
    > >                 ptrIndex1++);
    > >             for(ptrIndex2 = strTarget, ptrIndex3 = ptrIndex1;
    > >                 *ptrIndex3
    > >                 &&
    > >                 *ptrIndex2
    > >                 &&
    > >                 *ptrIndex3 == *ptrIndex2;
    > >                 ptrIndex3++, ptrIndex2++);
    > >             if (!*ptrIndex2) break;
    > >             ptrIndex1 = ptrIndex3;
    > >             if (!*ptrIndex3) break;
    > >         }

    >
    > These hunks are essentially performing the same part of the core loop;
    > find the next occurrence of the target string in the original string.
    >
    > The second one was written by Edward Nilges ("spinoza1111").  He offers
    > in its defense the assertion that it's the "best" algorithm.  (Nevermind
    > that it's got a bug; the bug could be fixed easily.)
    >
    > My thinking:
    >
    > The first is better, not because it's less buggy (that could be fixed), but
    > because it's simpler to understand.  It may, or may not, be less efficient.
    > However, efficiency will vary widely with input characteristics; for some
    > cases, the arguable inefficiencies may be completely irrelevant, while in
    > others, they'd be significant.
    >
    > But you should always write the simpler one first, and wait until you've
    > profiled the code and verified that there's a bottleneck, and you know where
    > it is, before trying to optimize.  You may, or may not, be able to outperform
    > a given library's implementation of strstr().  If you have unusual inputs,
    > you have a pretty good shot at it... But it may not be worth it.  The
    > complicated example has had at least one bug identified, and there may be
    > others.  It's extremely hard to read, partially because of the poor
    > naming, and partially because of the counter-idiomatic usage.  But mostly,
    > it's a lot more work to write and maintain, and for no known benefit.
    >
    > It's good to think a little about efficiency, but write for clarity and
    > correctness first, so you have a known-working program to check your
    > results if you later need to modify it for speed.


    yes. If strstr() didn't exist it would be necessary to invent it. The
    use of appropriate abstractions simplifies the code.

    We once had a system that manipulated messages. Messages were linked
    lists of memory blocks. Queues were linked lists of messages. Since
    the same pointers were used for both purposes a queue was also a list
    of memory blocks.

    The code needed to free (return to a free list) all the memory blocks
    of the messages that came before a particular message type. Two pages
    of tangled code needed to find the block before the first block of the
    target message followed by direct manipulation of queue head and tail
    pointer and the free list pointer. The blocks also had to be marked as
    free (they had some associated data). After I found my third bug I
    wrote:

    while (not empty (queue) and queue_head.message_type != target)
    {
    msg = remove_msg (queue)
    free_msg (msg)
    }

    (pseudo code)

    As my version was shorter, simpler and more correct it was adopted.
    Why deal with three things at once when it can be delegated else
    where? The code wanted to manipulate messages not memory blocks.
     
    Nick Keighley, Feb 11, 2010
    #16
  17. On 11 Feb, 06:52, Richard <> wrote:
    > Seebs <> writes:


    > > There was some discussion of algorithms to perform [some] task:


    [simple code]

    and

    [complex code]

    <snip>

    > > The first is better, not because it's less buggy (that could be fixed), but
    > > because it's simpler to understand.  It may, or may not, be less
    > > efficient.

    >
    > Total nonsense.
    >
    > The better one is the one that runs more efficiently and works.


    how do we demonstrate that it works?

    > So assuming
    > the bug ix fixed and its more efficient then other is better. Its called
    > a function. You dont need to know whats in it.


    "There are two ways of constructing a software design: One way is to
    make it so simple that there are obviously no deficiencies, and the
    other way is to make it so complicated that there are no obvious
    deficiencies. The first method is far more difficult."
    -- C.A.R. Hoare
     
    Nick Keighley, Feb 11, 2010
    #17
  18. In article <>,
    Seebs <> wrote:

    >But you should always write the simpler one first, and wait until you've
    >profiled the code and verified that there's a bottleneck, and you know where
    >it is, before trying to optimize.


    But wouldn't that be Extreme Programming?

    (cf http://groups.google.com/group/comp.lang.c/msg/89ba3ef6554c9810?hl=en)

    -- Richard
    --
    Please remember to mention me / in tapes you leave behind.
     
    Richard Tobin, Feb 11, 2010
    #18
  19. Seebs

    Seebs Guest

    On 2010-02-11, Richard Tobin <> wrote:
    > In article <>,
    > Seebs <> wrote:
    >>But you should always write the simpler one first, and wait until you've
    >>profiled the code and verified that there's a bottleneck, and you know where
    >>it is, before trying to optimize.


    > But wouldn't that be Extreme Programming?


    Only if you have two people do it at once, I think? :)

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
     
    Seebs, Feb 11, 2010
    #19
  20. Seebs

    John Bode Guest

    On Feb 11, 2:09 am, "io_x" <> wrote:
    > "Dann Corbit" <> ha scritto nel messaggionews:-september.org...
    >


    [snip]

    >
    > > For a standard library, the *most* important thing is correctness.  IOW,

    >
    > for a standard library function
    > the *most* important thing is the easy to use
    > and the error handling well think
    >


    It doesn't matter how easy to it is to use if it gives you the wrong
    answer.
    It doesn't matter how well it handles errors if it gives you the wrong
    answer.
    It doesn't matter how fast it is if it gives you the wrong answer.
    It doesn't matter how much memory it uses if it gives you the wrong
    answer.

    Correctness comes first. Then we can argue about everything else.
     
    John Bode, Feb 11, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Earl Teigrob
    Replies:
    1
    Views:
    340
    Scott Allen
    Jul 9, 2004
  2. Daniel

    String comparision efficency

    Daniel, Feb 14, 2005, in forum: Java
    Replies:
    14
    Views:
    1,814
    Daniel
    Feb 16, 2005
  3. Daniel T.
    Replies:
    2
    Views:
    332
    Daniel T.
    Oct 19, 2004
  4. Gernot Frisch
    Replies:
    3
    Views:
    310
    Efrat Regev
    Jan 24, 2005
  5. Replies:
    1
    Views:
    264
    Farshid Lashkari
    Nov 10, 2004
Loading...

Share This Page