Bug/Gross InEfficiency in HeathField's fgetline program

Antoninus Twink · Oct 7, 2007

The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.

char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}

Richard Heathfield · Oct 7, 2007

Antoninus Twink said:

The function below is from Richard HeathField's fgetline program.

It appears to be from my emgen utility, in fact.

For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings.

If it becomes a problem, I'll fix it. So far, it has not been a problem.

It is also written in a hard-to-read and clunky style.

A matter of opinion. Which bit did you find hard to read?

char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}

It is not obvious to me that this code correctly replaces the code I wrote.

Mark McIntyre · Oct 8, 2007

The function below is from Richard HeathField's fgetline program.

Troll alert.

it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings.

Possibly but consider the three laws of optimisation, and the data
typically being processed.

It is also written in a hard-to-read and clunky style.

I disagree.

char *t, *u;
if(t=u=malloc(strlen(s)+1))

hilarious !

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Tor Rustad · Oct 8, 2007

Antoninus said:
The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient.

What new, R.H. has written slow code for years. ;-)

This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.
ROTFL

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}

Hmm.. who's code was hard-to-read and clunky?

Anyway, you are missing the point. Strings are usually rather short, and
when located in L1 cache, is doesn't matter much, scanning it 2 or 3 times.

However, a real spead-up here, would be to drop malloc() and use fixed
sized strings. That's the way, to beat the crap out of OO code.

Richard Heathfield · Oct 8, 2007

Tor Rustad said:

Anyway, you are missing the point. Strings are usually rather short, and
when located in L1 cache, is doesn't matter much, scanning it 2 or 3
times.

The important thing is to avoid reading it n times.

However, a real spead-up here, would be to drop malloc() and use fixed
sized strings. That's the way, to beat the crap out of OO code.

If performance were the primary consideration, that's exactly what I'd have
done. Since it wasn't, it isn't. I considered robustness in the face of
long strings to be more important.

Peter Nilsson · Oct 8, 2007

Antoninus Twink said:
The function below is from Richard HeathField's fgetline
program. For some reason, it makes three passes through
the string (a strlen(), a strcpy() then another pass to
change dots) when two would clearly be sufficient. ...

For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

CBFalconer · Oct 8, 2007

Mark said:
.... snip ...

hilarious !

What is hilarious? It should detect the failure of malloc quite
reliably. Of course the lack of blanks is rather foul.

Chris Thomasson · Oct 8, 2007

Peter Nilsson said:
For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

Perhaps when he says his name out loud with a nervous tinge, it sound as if
there should be to capital letters... Speaking out load:

"That darn Heath, possible minor/stutter, Field!"

Na... Nobody is that serious about the pronouncement of somebody's name...
Ahh, it could be a simple typo...

Richard Heathfield · Oct 8, 2007

Peter Nilsson said:

For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

The mis-capitalisation is consistent with that used by a troll named "Paul"
(with an email address containing "paulcr"), who once threatened to break
my nose, apparently because he didn't know C very well.

Antoninus Twink · Oct 8, 2007

Antoninus Twink said:

It appears to be from my emgen utility, in fact.

If it becomes a problem, I'll fix it. So far, it has not been a problem.

But what's frustrating is that it's an inefficiency that's completely
gratuitous! We all know that micro-optimization is bad, but this is a
micro-anti-optimization, which is surely worse!

The most natural way to look at this is "copy the characters from one
string to another, replacing . by _ when we see it". This has the
benefit of being a 1-pass algorithm. Instead, you split this up "first
copy one string to another; then go back to the beginning and swap . for
_". This makes a simple single operation into two, at the same time
introducing an extra pass through the string! It's not as if there's so
much fiendish complexity here that there's any benefit in breaking it up
into two separate operations.

A matter of opinion. Which bit did you find hard to read?

The function is a completely trivial one, yet I can't see it all at once
in my editor without scrolling! Whitespace can help readability, but
excessive whitespace can reduce it, and at the same time give too much
weight to things that aren't important.

It is not obvious to me that this code correctly replaces the code I wrote.

If you believe that it doesn't correctly replace the code you wrote, it
would be easy to demonstrate that by pointing out a specific input s for
which it gives a different result, or an error (syntax error or
undefined behavior or whatever).

Ian Collins · Oct 8, 2007

Antoninus said:
The function is a completely trivial one, yet I can't see it all at once
in my editor without scrolling! Whitespace can help readability, but
excessive whitespace can reduce it, and at the same time give too much
weight to things that aren't important.

You must have a very small screen.

Philip Potter · Oct 8, 2007

Antoninus said:
If you believe that it doesn't correctly replace the code you wrote

He hasn't said that this is what he believes. He is stating that your
code does not *obviously* replace his correctly. It is up to you to
prove it does, if you're going to say his is defective and you have come
up with a replacement.

Joachim Schmitz · Oct 8, 2007

Antoninus Twink said:
The function is a completely trivial one, yet I can't see it all at once
in my editor without scrolling!

20 lines don't fit your screen???

Bye, Jojo

Army1987 · Oct 8, 2007

For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

And two letters in InEfficiency, when zero would clearly be
sufficient.

Richard Heathfield · Oct 8, 2007

Philip Potter said:

He hasn't said that this is what he believes.

I always knew you could read, Philip.

Some other people, I'm not so
sure about.

Richard Bos · Oct 8, 2007

Peter Nilsson said:
For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

Well, yes. Now look at OP's /nom de Usenet/. Surprised?

Richard

Chris Hills · Oct 8, 2007

Antoninus Twink said:
The function below is from Richard HeathField's fgetline program.

I know Richard has a thing about JN but can we not all sink to the same
level.

We should be discussing C, as it is used here, not all trying to be
pedants and petty point scoring.

Richard · Oct 8, 2007

Antoninus Twink said:
The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.

char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}

I would move the ++ part

,----
| while(*u++=(*s=='.' ? '_' : *s))
| s++;
`----

But yes, much nicer, easier to read and understand and possibly
faster. And hopefully working :-;

Thad Smith · Oct 8, 2007

The function is a completely trivial one, yet I can't see it all at once
in my editor without scrolling! Whitespace can help readability, but
excessive whitespace can reduce it, and at the same time give too much
weight to things that aren't important.

If you believe that it doesn't correctly replace the code you wrote, it
would be easy to demonstrate that by pointing out a specific input s for
which it gives a different result, or an error (syntax error or
undefined behavior or whatever).

What happens when malloc returns a null pointer?

santosh · Oct 8, 2007

Thad said:
What happens when malloc returns a null pointer?

AFAICT it returns a null pointer value, same as Richard's version.

Fibonacci	0	May 13, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
C language. work with text	3	Dec 10, 2021
code review	26	Feb 6, 2004
Can't solve problems! please Help	0	Sep 26, 2022
compressing charatcers	35	Apr 2, 2014
Strange bug	65	Nov 19, 2010
K&R exercise 5-5	10	Feb 20, 2007

Bug/Gross InEfficiency in HeathField's fgetline program

Antoninus Twink

Richard Heathfield

Mark McIntyre

Tor Rustad

Richard Heathfield

Peter Nilsson

CBFalconer

Chris Thomasson

Richard Heathfield

Antoninus Twink

Ian Collins

Philip Potter

Joachim Schmitz

Army1987

Richard Heathfield

Richard Bos

Chris Hills

Richard

Thad Smith

santosh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads