Bug/Gross InEfficiency in HeathField's fgetline program

  • Thread starter Antoninus Twink
  • Start date
A

Antoninus Twink

The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.

char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}
 
R

Richard Heathfield

Antoninus Twink said:
The function below is from Richard HeathField's fgetline program.

It appears to be from my emgen utility, in fact.
For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings.

If it becomes a problem, I'll fix it. So far, it has not been a problem.
It is also written in a hard-to-read and clunky style.

A matter of opinion. Which bit did you find hard to read?
char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}

It is not obvious to me that this code correctly replaces the code I wrote.
 
M

Mark McIntyre

The function below is from Richard HeathField's fgetline program.

Troll alert.
it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings.

Possibly but consider the three laws of optimisation, and the data
typically being processed.
It is also written in a hard-to-read and clunky style.

I disagree.
char *t, *u;
if(t=u=malloc(strlen(s)+1))

hilarious !

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
T

Tor Rustad

Antoninus said:
The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient.

What new, R.H. has written slow code for years. ;-)
This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.
ROTFL


Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}

Hmm.. who's code was hard-to-read and clunky?


Anyway, you are missing the point. Strings are usually rather short, and
when located in L1 cache, is doesn't matter much, scanning it 2 or 3 times.

However, a real spead-up here, would be to drop malloc() and use fixed
sized strings. That's the way, to beat the crap out of OO code.
 
R

Richard Heathfield

Tor Rustad said:

Anyway, you are missing the point. Strings are usually rather short, and
when located in L1 cache, is doesn't matter much, scanning it 2 or 3
times.

The important thing is to avoid reading it n times.
However, a real spead-up here, would be to drop malloc() and use fixed
sized strings. That's the way, to beat the crap out of OO code.

If performance were the primary consideration, that's exactly what I'd have
done. Since it wasn't, it isn't. I considered robustness in the face of
long strings to be more important.
 
P

Peter Nilsson

Antoninus Twink said:
The function below is from Richard HeathField's fgetline
program. For some reason, it makes three passes through
the string (a strlen(), a strcpy() then another pass to
change dots) when two would clearly be sufficient. ...

For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.
 
C

CBFalconer

Mark said:
.... snip ...


hilarious !

What is hilarious? It should detect the failure of malloc quite
reliably. Of course the lack of blanks is rather foul.
 
C

Chris Thomasson

Peter Nilsson said:
For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

Perhaps when he says his name out loud with a nervous tinge, it sound as if
there should be to capital letters... Speaking out load:

"That darn Heath, possible minor/stutter, Field!"

Na... Nobody is that serious about the pronouncement of somebody's name...
Ahh, it could be a simple typo...
 
R

Richard Heathfield

Peter Nilsson said:
For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

The mis-capitalisation is consistent with that used by a troll named "Paul"
(with an email address containing "paulcr"), who once threatened to break
my nose, apparently because he didn't know C very well.
 
A

Antoninus Twink

Antoninus Twink said:


It appears to be from my emgen utility, in fact.


If it becomes a problem, I'll fix it. So far, it has not been a problem.

But what's frustrating is that it's an inefficiency that's completely
gratuitous! We all know that micro-optimization is bad, but this is a
micro-anti-optimization, which is surely worse!

The most natural way to look at this is "copy the characters from one
string to another, replacing . by _ when we see it". This has the
benefit of being a 1-pass algorithm. Instead, you split this up "first
copy one string to another; then go back to the beginning and swap . for
_". This makes a simple single operation into two, at the same time
introducing an extra pass through the string! It's not as if there's so
much fiendish complexity here that there's any benefit in breaking it up
into two separate operations.
A matter of opinion. Which bit did you find hard to read?

The function is a completely trivial one, yet I can't see it all at once
in my editor without scrolling! Whitespace can help readability, but
excessive whitespace can reduce it, and at the same time give too much
weight to things that aren't important.
It is not obvious to me that this code correctly replaces the code I wrote.

If you believe that it doesn't correctly replace the code you wrote, it
would be easy to demonstrate that by pointing out a specific input s for
which it gives a different result, or an error (syntax error or
undefined behavior or whatever).
 
I

Ian Collins

Antoninus said:
The function is a completely trivial one, yet I can't see it all at once
in my editor without scrolling! Whitespace can help readability, but
excessive whitespace can reduce it, and at the same time give too much
weight to things that aren't important.
You must have a very small screen.
 
P

Philip Potter

Antoninus said:
If you believe that it doesn't correctly replace the code you wrote

He hasn't said that this is what he believes. He is stating that your
code does not *obviously* replace his correctly. It is up to you to
prove it does, if you're going to say his is defective and you have come
up with a replacement.
 
J

Joachim Schmitz

Antoninus Twink said:
The function is a completely trivial one, yet I can't see it all at once
in my editor without scrolling!
20 lines don't fit your screen???

Bye, Jojo
 
A

Army1987

For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.
And two letters in InEfficiency, when zero would clearly be
sufficient.
 
R

Richard Heathfield

Philip Potter said:
He hasn't said that this is what he believes.

I always knew you could read, Philip. :) Some other people, I'm not so
sure about.
 
R

Richard Bos

Peter Nilsson said:
For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

Well, yes. Now look at OP's /nom de Usenet/. Surprised?

Richard
 
C

Chris Hills

Antoninus Twink said:
The function below is from Richard HeathField's fgetline program.

I know Richard has a thing about JN but can we not all sink to the same
level.

We should be discussing C, as it is used here, not all trying to be
pedants and petty point scoring.
 
R

Richard

Antoninus Twink said:
The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.

char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}

I would move the ++ part

,----
| while(*u++=(*s=='.' ? '_' : *s))
| s++;
`----

But yes, much nicer, easier to read and understand and possibly
faster. And hopefully working :-;
 
T

Thad Smith

The function is a completely trivial one, yet I can't see it all at once
in my editor without scrolling! Whitespace can help readability, but
excessive whitespace can reduce it, and at the same time give too much
weight to things that aren't important.


If you believe that it doesn't correctly replace the code you wrote, it
would be easy to demonstrate that by pointing out a specific input s for
which it gives a different result, or an error (syntax error or
undefined behavior or whatever).

What happens when malloc returns a null pointer?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Fibonacci 0
Adding adressing of IPv6 to program 1
C language. work with text 3
code review 26
Can't solve problems! please Help 0
compressing charatcers 35
Strange bug 65
K&R exercise 5-5 10

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top