Bug/Gross InEfficiency in HeathField's fgetline program

Discussion in 'C Programming' started by Antoninus Twink, Oct 7, 2007.

  1. The function below is from Richard HeathField's fgetline program. For
    some reason, it makes three passes through the string (a strlen(), a
    strcpy() then another pass to change dots) when two would clearly be
    sufficient. This could lead to unnecessarily bad performance on very
    long strings. It is also written in a hard-to-read and clunky style.

    char *dot_to_underscore(const char *s)
    {
    char *t = malloc(strlen(s) + 1);
    if(t != NULL)
    {
    char *u;
    strcpy(t, s);
    u = t;
    while(*u)
    {
    if(*u == '.')
    {
    *u = '_';
    }
    ++u;
    }
    }
    return
    t;
    }

    Proposed solution:

    char *dot_to_underscore(const char *s)
    {
    char *t, *u;
    if(t=u=malloc(strlen(s)+1))
    while(*u++=(*s=='.' ? s++, '_' : *s++));
    return t;
    }
    Antoninus Twink, Oct 7, 2007
    #1
    1. Advertising

  2. Antoninus Twink said:

    > The function below is from Richard HeathField's fgetline program.


    It appears to be from my emgen utility, in fact.

    > For
    > some reason, it makes three passes through the string (a strlen(), a
    > strcpy() then another pass to change dots) when two would clearly be
    > sufficient. This could lead to unnecessarily bad performance on very
    > long strings.


    If it becomes a problem, I'll fix it. So far, it has not been a problem.

    > It is also written in a hard-to-read and clunky style.


    A matter of opinion. Which bit did you find hard to read?

    > char *dot_to_underscore(const char *s)
    > {
    > char *t = malloc(strlen(s) + 1);
    > if(t != NULL)
    > {
    > char *u;
    > strcpy(t, s);
    > u = t;
    > while(*u)
    > {
    > if(*u == '.')
    > {
    > *u = '_';
    > }
    > ++u;
    > }
    > }
    > return
    > t;
    > }
    >
    > Proposed solution:
    >
    > char *dot_to_underscore(const char *s)
    > {
    > char *t, *u;
    > if(t=u=malloc(strlen(s)+1))
    > while(*u++=(*s=='.' ? s++, '_' : *s++));
    > return t;
    > }


    It is not obvious to me that this code correctly replaces the code I wrote.

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
    Richard Heathfield, Oct 7, 2007
    #2
    1. Advertising

  3. On Mon, 8 Oct 2007 00:16:07 +0200 (CEST), in comp.lang.c , Antoninus
    Twink <> wrote:

    >The function below is from Richard HeathField's fgetline program.


    Troll alert.

    >it makes three passes through the string (a strlen(), a
    >strcpy() then another pass to change dots) when two would clearly be
    >sufficient. This could lead to unnecessarily bad performance on very
    >long strings.


    Possibly but consider the three laws of optimisation, and the data
    typically being processed.

    >It is also written in a hard-to-read and clunky style.


    I disagree.

    > char *t, *u;
    > if(t=u=malloc(strlen(s)+1))


    hilarious !

    --
    Mark McIntyre

    "Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are,
    by definition, not smart enough to debug it."
    --Brian Kernighan
    Mark McIntyre, Oct 8, 2007
    #3
  4. Antoninus Twink

    Tor Rustad Guest

    Antoninus Twink wrote:

    > The function below is from Richard HeathField's fgetline program. For
    > some reason, it makes three passes through the string (a strlen(), a
    > strcpy() then another pass to change dots) when two would clearly be
    > sufficient.


    What new, R.H. has written slow code for years. ;-)

    > This could lead to unnecessarily bad performance on very
    > long strings. It is also written in a hard-to-read and clunky style.


    ROTFL


    > Proposed solution:
    >
    > char *dot_to_underscore(const char *s)
    > {
    > char *t, *u;
    > if(t=u=malloc(strlen(s)+1))
    > while(*u++=(*s=='.' ? s++, '_' : *s++));
    > return t;
    > }


    Hmm.. who's code was hard-to-read and clunky?


    Anyway, you are missing the point. Strings are usually rather short, and
    when located in L1 cache, is doesn't matter much, scanning it 2 or 3 times.

    However, a real spead-up here, would be to drop malloc() and use fixed
    sized strings. That's the way, to beat the crap out of OO code.

    --
    Tor <torust [at] online [dot] no>

    C-FAQ: http://c-faq.com/
    Tor Rustad, Oct 8, 2007
    #4
  5. Tor Rustad said:

    <snip>
    >
    > Anyway, you are missing the point. Strings are usually rather short, and
    > when located in L1 cache, is doesn't matter much, scanning it 2 or 3
    > times.


    The important thing is to avoid reading it n times.

    > However, a real spead-up here, would be to drop malloc() and use fixed
    > sized strings. That's the way, to beat the crap out of OO code.


    If performance were the primary consideration, that's exactly what I'd have
    done. Since it wasn't, it isn't. I considered robustness in the face of
    long strings to be more important.

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
    Richard Heathfield, Oct 8, 2007
    #5
  6. Antoninus Twink <> wrote:
    > The function below is from Richard HeathField's fgetline
    > program. For some reason, it makes three passes through
    > the string (a strlen(), a strcpy() then another pass to
    > change dots) when two would clearly be sufficient. ...


    For some reason you've capitalised three letters in his
    name, when two would clearly be sufficient.

    --
    Peter
    Peter Nilsson, Oct 8, 2007
    #6
  7. Antoninus Twink

    CBFalconer Guest

    Mark McIntyre wrote:
    > Antoninus Twink <> wrote:
    >

    .... snip ...
    >
    >> char *t, *u;
    >> if(t=u=malloc(strlen(s)+1))

    >
    > hilarious !


    What is hilarious? It should detect the failure of malloc quite
    reliably. Of course the lack of blanks is rather foul.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>



    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Oct 8, 2007
    #7
  8. Re: Re:Bug/Gross InEfficiency in HeathField's fgetline program

    "Peter Nilsson" <> wrote in message
    news:...
    > Antoninus Twink <> wrote:
    >> The function below is from Richard HeathField's fgetline
    >> program. For some reason, it makes three passes through
    >> the string (a strlen(), a strcpy() then another pass to
    >> change dots) when two would clearly be sufficient. ...

    >
    > For some reason you've capitalised three letters in his
    > name, when two would clearly be sufficient.


    Perhaps when he says his name out loud with a nervous tinge, it sound as if
    there should be to capital letters... Speaking out load:

    "That darn Heath, possible minor/stutter, Field!"

    Na... Nobody is that serious about the pronouncement of somebody's name...
    Ahh, it could be a simple typo...
    Chris Thomasson, Oct 8, 2007
    #8
  9. Peter Nilsson said:

    > Antoninus Twink <> wrote:
    >> The function below is from Richard HeathField's fgetline
    >> program. For some reason, it makes three passes through
    >> the string (a strlen(), a strcpy() then another pass to
    >> change dots) when two would clearly be sufficient. ...

    >
    > For some reason you've capitalised three letters in his
    > name, when two would clearly be sufficient.


    The mis-capitalisation is consistent with that used by a troll named "Paul"
    (with an email address containing "paulcr"), who once threatened to break
    my nose, apparently because he didn't know C very well.

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
    Richard Heathfield, Oct 8, 2007
    #9
  10. On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
    > Antoninus Twink said:
    >
    >> The function below is from Richard HeathField's fgetline program.

    >
    > It appears to be from my emgen utility, in fact.
    >
    >> For
    >> some reason, it makes three passes through the string (a strlen(), a
    >> strcpy() then another pass to change dots) when two would clearly be
    >> sufficient. This could lead to unnecessarily bad performance on very
    >> long strings.

    >
    > If it becomes a problem, I'll fix it. So far, it has not been a problem.


    But what's frustrating is that it's an inefficiency that's completely
    gratuitous! We all know that micro-optimization is bad, but this is a
    micro-anti-optimization, which is surely worse!

    The most natural way to look at this is "copy the characters from one
    string to another, replacing . by _ when we see it". This has the
    benefit of being a 1-pass algorithm. Instead, you split this up "first
    copy one string to another; then go back to the beginning and swap . for
    _". This makes a simple single operation into two, at the same time
    introducing an extra pass through the string! It's not as if there's so
    much fiendish complexity here that there's any benefit in breaking it up
    into two separate operations.

    >
    >> It is also written in a hard-to-read and clunky style.

    >
    > A matter of opinion. Which bit did you find hard to read?


    The function is a completely trivial one, yet I can't see it all at once
    in my editor without scrolling! Whitespace can help readability, but
    excessive whitespace can reduce it, and at the same time give too much
    weight to things that aren't important.

    >
    >> char *dot_to_underscore(const char *s)
    >> {
    >> char *t = malloc(strlen(s) + 1);
    >> if(t != NULL)
    >> {
    >> char *u;
    >> strcpy(t, s);
    >> u = t;
    >> while(*u)
    >> {
    >> if(*u == '.')
    >> {
    >> *u = '_';
    >> }
    >> ++u;
    >> }
    >> }
    >> return
    >> t;
    >> }
    >>
    >> Proposed solution:
    >>
    >> char *dot_to_underscore(const char *s)
    >> {
    >> char *t, *u;
    >> if(t=u=malloc(strlen(s)+1))
    >> while(*u++=(*s=='.' ? s++, '_' : *s++));
    >> return t;
    >> }

    >
    > It is not obvious to me that this code correctly replaces the code I wrote.


    If you believe that it doesn't correctly replace the code you wrote, it
    would be easy to demonstrate that by pointing out a specific input s for
    which it gives a different result, or an error (syntax error or
    undefined behavior or whatever).
    Antoninus Twink, Oct 8, 2007
    #10
  11. Antoninus Twink

    Ian Collins Guest

    Antoninus Twink wrote:
    > On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
    >> Antoninus Twink said:
    >>> It is also written in a hard-to-read and clunky style.

    >> A matter of opinion. Which bit did you find hard to read?

    >
    > The function is a completely trivial one, yet I can't see it all at once
    > in my editor without scrolling! Whitespace can help readability, but
    > excessive whitespace can reduce it, and at the same time give too much
    > weight to things that aren't important.
    >

    You must have a very small screen.

    --
    Ian Collins.
    Ian Collins, Oct 8, 2007
    #11
  12. Antoninus Twink wrote:
    > On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
    >> It is not obvious to me that this code correctly replaces the code I wrote.

    >
    > If you believe that it doesn't correctly replace the code you wrote


    He hasn't said that this is what he believes. He is stating that your
    code does not *obviously* replace his correctly. It is up to you to
    prove it does, if you're going to say his is defective and you have come
    up with a replacement.

    --
    Philip Potter pgp <at> doc.ic.ac.uk
    Philip Potter, Oct 8, 2007
    #12
  13. "Antoninus Twink" <> schrieb im Newsbeitrag
    news:...
    > On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
    >> Antoninus Twink said:
    >>> It is also written in a hard-to-read and clunky style.

    >>
    >> A matter of opinion. Which bit did you find hard to read?

    >
    > The function is a completely trivial one, yet I can't see it all at once
    > in my editor without scrolling!

    20 lines don't fit your screen???

    Bye, Jojo
    Joachim Schmitz, Oct 8, 2007
    #13
  14. Antoninus Twink

    Army1987 Guest

    Re: Re:Bug/Gross InEfficiency in HeathField's fgetline program

    On Sun, 07 Oct 2007 19:59:08 -0700, Peter Nilsson wrote:

    > Antoninus Twink <> wrote:
    >> The function below is from Richard HeathField's fgetline
    >> program. For some reason, it makes three passes through
    >> the string (a strlen(), a strcpy() then another pass to
    >> change dots) when two would clearly be sufficient. ...

    >
    > For some reason you've capitalised three letters in his
    > name, when two would clearly be sufficient.

    And two letters in InEfficiency, when zero would clearly be
    sufficient.

    --
    Army1987 (Replace "NOSPAM" with "email")
    A hamburger is better than nothing.
    Nothing is better than eternal happiness.
    Therefore, a hamburger is better than eternal happiness.
    Army1987, Oct 8, 2007
    #14
  15. Philip Potter said:

    > Antoninus Twink wrote:
    >> On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
    >>> It is not obvious to me that this code correctly replaces the code I
    >>> wrote.

    >>
    >> If you believe that it doesn't correctly replace the code you wrote

    >
    > He hasn't said that this is what he believes.


    I always knew you could read, Philip. :) Some other people, I'm not so
    sure about.

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
    Richard Heathfield, Oct 8, 2007
    #15
  16. Antoninus Twink

    Richard Bos Guest

    Peter Nilsson <> wrote:

    > Antoninus Twink <> wrote:
    > > The function below is from Richard HeathField's fgetline
    > > program. For some reason, it makes three passes through
    > > the string (a strlen(), a strcpy() then another pass to
    > > change dots) when two would clearly be sufficient. ...

    >
    > For some reason you've capitalised three letters in his
    > name, when two would clearly be sufficient.


    Well, yes. Now look at OP's /nom de Usenet/. Surprised?

    Richard
    Richard Bos, Oct 8, 2007
    #16
  17. Antoninus Twink

    Chris Hills Guest

    In article <>, Antoninus Twink
    <> writes
    >The function below is from Richard HeathField's fgetline program.


    I know Richard has a thing about JN but can we not all sink to the same
    level.

    We should be discussing C, as it is used here, not all trying to be
    pedants and petty point scoring.


    --
    \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
    \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
    /\/\/ www.phaedsys.org \/\/\
    \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
    Chris Hills, Oct 8, 2007
    #17
  18. Antoninus Twink

    Richard Guest

    Antoninus Twink <> writes:

    > The function below is from Richard HeathField's fgetline program. For
    > some reason, it makes three passes through the string (a strlen(), a
    > strcpy() then another pass to change dots) when two would clearly be
    > sufficient. This could lead to unnecessarily bad performance on very
    > long strings. It is also written in a hard-to-read and clunky style.
    >
    > char *dot_to_underscore(const char *s)
    > {
    > char *t = malloc(strlen(s) + 1);
    > if(t != NULL)
    > {
    > char *u;
    > strcpy(t, s);
    > u = t;
    > while(*u)
    > {
    > if(*u == '.')
    > {
    > *u = '_';
    > }
    > ++u;
    > }
    > }
    > return
    > t;
    > }
    >
    > Proposed solution:
    >
    > char *dot_to_underscore(const char *s)
    > {
    > char *t, *u;
    > if(t=u=malloc(strlen(s)+1))
    > while(*u++=(*s=='.' ? s++, '_' : *s++));
    > return t;
    > }


    I would move the ++ part

    ,----
    | while(*u++=(*s=='.' ? '_' : *s))
    | s++;
    `----

    But yes, much nicer, easier to read and understand and possibly
    faster. And hopefully working :-;
    Richard, Oct 8, 2007
    #18
  19. Antoninus Twink

    Thad Smith Guest

    Antoninus Twink wrote:
    > On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
    >> Antoninus Twink said:


    > The function is a completely trivial one, yet I can't see it all at once
    > in my editor without scrolling! Whitespace can help readability, but
    > excessive whitespace can reduce it, and at the same time give too much
    > weight to things that aren't important.
    >
    >>> char *dot_to_underscore(const char *s)
    >>> {
    >>> char *t = malloc(strlen(s) + 1);
    >>> if(t != NULL)
    >>> {
    >>> char *u;
    >>> strcpy(t, s);
    >>> u = t;
    >>> while(*u)
    >>> {
    >>> if(*u == '.')
    >>> {
    >>> *u = '_';
    >>> }
    >>> ++u;
    >>> }
    >>> }
    >>> return
    >>> t;
    >>> }
    >>>
    >>> Proposed solution:
    >>>
    >>> char *dot_to_underscore(const char *s)
    >>> {
    >>> char *t, *u;
    >>> if(t=u=malloc(strlen(s)+1))
    >>> while(*u++=(*s=='.' ? s++, '_' : *s++));
    >>> return t;
    >>> }

    >> It is not obvious to me that this code correctly replaces the code I wrote.

    >
    > If you believe that it doesn't correctly replace the code you wrote, it
    > would be easy to demonstrate that by pointing out a specific input s for
    > which it gives a different result, or an error (syntax error or
    > undefined behavior or whatever).


    What happens when malloc returns a null pointer?

    --
    Thad
    Thad Smith, Oct 8, 2007
    #19
  20. Antoninus Twink

    santosh Guest

    Thad Smith wrote:

    > Antoninus Twink wrote:
    >> On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
    >>> Antoninus Twink said:

    >
    >> The function is a completely trivial one, yet I can't see it all at
    >> once in my editor without scrolling! Whitespace can help readability,
    >> but excessive whitespace can reduce it, and at the same time give too
    >> much weight to things that aren't important.
    >>
    >>>> char *dot_to_underscore(const char *s)
    >>>> {
    >>>> char *t = malloc(strlen(s) + 1);
    >>>> if(t != NULL)
    >>>> {
    >>>> char *u;
    >>>> strcpy(t, s);
    >>>> u = t;
    >>>> while(*u)
    >>>> {
    >>>> if(*u == '.')
    >>>> {
    >>>> *u = '_';
    >>>> }
    >>>> ++u;
    >>>> }
    >>>> }
    >>>> return
    >>>> t;
    >>>> }
    >>>>
    >>>> Proposed solution:
    >>>>
    >>>> char *dot_to_underscore(const char *s)
    >>>> {
    >>>> char *t, *u;
    >>>> if(t=u=malloc(strlen(s)+1))
    >>>> while(*u++=(*s=='.' ? s++, '_' : *s++));
    >>>> return t;
    >>>> }
    >>> It is not obvious to me that this code correctly replaces the code I
    >>> wrote.

    >>
    >> If you believe that it doesn't correctly replace the code you wrote,
    >> it would be easy to demonstrate that by pointing out a specific input
    >> s for which it gives a different result, or an error (syntax error or
    >> undefined behavior or whatever).

    >
    > What happens when malloc returns a null pointer?


    AFAICT it returns a null pointer value, same as Richard's version.
    santosh, Oct 8, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jeff Rodriguez
    Replies:
    4
    Views:
    428
    Jeff Rodriguez
    Nov 17, 2003
  2. Program inefficiency?

    , Sep 29, 2007, in forum: Python
    Replies:
    17
    Views:
    597
    Florian Schmidt
    Oct 1, 2007
  3. Replies:
    6
    Views:
    291
    Dimiter \malkia\ Stanev
    Oct 1, 2007
  4. Dik T. Winter
    Replies:
    2
    Views:
    336
    Charlie Gordon
    Nov 7, 2007
  5. Antoninus Twink

    K&R2 ex. 2-4: Heathfield's gross inefficiency

    Antoninus Twink, May 12, 2009, in forum: C Programming
    Replies:
    37
    Views:
    972
    Antoninus Twink
    May 21, 2009
Loading...

Share This Page