Bug/Gross InEfficiency in HeathField's fgetline program

santosh · Oct 9, 2007

Richard said:
If you can be bothered, what happens if you replace the strcpy() in
RH's code with a memcpy() (the length being already known)?

The average time drops to 1.320000s. Not much.

Richard Heathfield · Oct 9, 2007

santosh said:

RH's version = 1.480000s
AT's version = 1.212500s

[system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
So for this system at least, AT's version is significantly faster.

No, it isn't significantly faster. The function is called *once* by the
program of which it is a part. On my system, it takes less than a
microsecond to run. Let's round up and call it one microsecond. According
to your timings, AT's version saves (1.48 - 1.2125)/1.48 = approximately
0.180743 of the time. This equates to 181 nanoseconds per program run.

To save as much as a single second, you'd have to run the program well over
five *million* times. If it only took me one minute to verify the change,
update the source, recompile, and re-test, I'd still have to run the code
three hundred million times just to break even. I type at 40wpm, so I can
probably manage to add one error message identifier and meaningful message
text in about five seconds. 300,000,000 * 5 = 1,500,000,000 seconds. So
breaking even would take 47 years (assuming I did nothing else for the
next 47 years, such as eating and sleeping and actual programming and
watching LOTR and so on).

So in fact it would be counter-productive to adopt the change, even if I
thought it were an improvement, which I don't.

santosh · Oct 9, 2007

Richard said:
santosh said:

RH's version = 1.480000s
AT's version = 1.212500s

[system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
So for this system at least, AT's version is significantly faster.

Click to expand...

No, it isn't significantly faster. The function is called *once* by
the program of which it is a part. On my system, it takes less than a
microsecond to run. Let's round up and call it one microsecond.
According to your timings, AT's version saves (1.48 - 1.2125)/1.48 =
approximately 0.180743 of the time. This equates to 181 nanoseconds
per program run.

To save as much as a single second, you'd have to run the program well
over five *million* times. If it only took me one minute to verify the
change, update the source, recompile, and re-test, I'd still have to
run the code three hundred million times just to break even. I type at
40wpm, so I can probably manage to add one error message identifier
and meaningful message text in about five seconds. 300,000,000 * 5 =
1,500,000,000 seconds. So breaking even would take 47 years (assuming
I did nothing else for the next 47 years, such as eating and sleeping
and actual programming and watching LOTR and so on).

So in fact it would be counter-productive to adopt the change, even if
I thought it were an improvement, which I don't.

You are right. I got carried away by the literal numbers and failed to
put them perspective with the real world.

Tor Rustad · Oct 9, 2007

santosh said:
Tor said:

santosh said:

John Bode wrote: [...]

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.
Interesting! This could be a case of simpler constructions being more
easily understood by the optimiser.

Click to expand...

I don't think so. My guess would be, either the measurement is
misleading, or strcpy() help alignment and execute some code in
parallel.

Click to expand...

Well I did a small comparison test between the two version on a string
of length 209,715,200 bytes, with the '.' character just before the
terminating null. I used clock to time the functions. For four runs for
each version here are the averages:

RH's version = 1.480000s
AT's version = 1.212500s

Yeah, that's more like it.

[system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
So for this system at least, AT's version is significantly faster.

I have not seen the fgetline() code, but if the function in question is
called only *once*, then OP "optimized" the *outer* loop, a professional
would go for the *inner* loop.

Your measurement, showed that OP's posted nonsense, an insignificant
micro-optimization.

John Bode · Oct 9, 2007

John Bode wrote:

[...]

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

Click to expand...

ROFL!

However, the optimizer can be playing tricks with you here.

Oh, no doubt. But that kind of reinforces the idea that writing code
in a clear, straightforward way (that is, clear to anyone who *isn't*
an expert in C) is better than trying to save screen real estate.

I'm rewriting the test harness and running it on my home box (Gentoo);
the box I ran it on earlier was RHEL 3 running in a VMware session on
an XP Pro box that's running eleventy billion server processes. I'm
also going to try it with various string lengths and optimization
options.

Anyway, RH
code was 96% more readable and maintainable, so OP was Trolling.

Again, no doubt.

CBFalconer · Oct 9, 2007

santosh said:
CBFalconer wrote:
.... snip ...

You mean you commence copy when malloc fails?

Yup. Slight overexuberance with the bang.

Richard · Oct 9, 2007

John Bode said:
John Bode wrote:

[...]

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

Click to expand...

ROFL!

However, the optimizer can be playing tricks with you here.

Click to expand...

Oh, no doubt. But that kind of reinforces the idea that writing code
in a clear, straightforward way (that is, clear to anyone who *isn't*
an expert in C) is better than trying to save screen real estate.

I'm rewriting the test harness and running it on my home box (Gentoo);
the box I ran it on earlier was RHEL 3 running in a VMware session on
an XP Pro box that's running eleventy billion server processes. I'm
also going to try it with various string lengths and optimization
options.

Anyway, RH
code was 96% more readable and maintainable, so OP was Trolling.

Click to expand...

Again, no doubt.

If you prefer RHs code I am astonished. But it takes all sorts I
support. it's not "bad code" by any stretch of the imagination.

As was pointed out earlier that kind of speed increase/decrease is
generally immaterial however, I know for a fact that the time saved in
debugging and reading it is worth its weight in gold if you can not
understand such a bog standard 2 lines then there are issues in your
understanding of C.

while(*u++=(*s=='.' ? '_' : *s))
s++;

It couldn't be easier.

Old Wolf · Oct 9, 2007

You are splitting hares. It is functionally equivalent - its advantage
is a slightly shorter total length, but in exchange the loop doesn't
have an empty body.

The difference is more than you realise; here we are
talking about code readability and maintainability and
the modified version has two major improvements to your
original:
* The s++ only occurs once
* It has a loop body

And I find it amusing that you object to me posting under a Usenet
handle, when your own posts are from "Old Wolf".

I'm not objecting to you posting under a handle. Also,
I find it amusing that you think it improves code if
you can make a loop body empty.

¬a\\/b · Oct 9, 2007

In data Tue, 09 Oct 2007 02:39:06 +0200, Richard scrisse:

John Bode said:
John Bode said:

John Bode wrote:

[...]

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

ROFL!

However, the optimizer can be playing tricks with you here.

Click to expand...

Oh, no doubt. But that kind of reinforces the idea that writing code
in a clear, straightforward way (that is, clear to anyone who *isn't*
an expert in C) is better than trying to save screen real estate.

I'm rewriting the test harness and running it on my home box (Gentoo);
the box I ran it on earlier was RHEL 3 running in a VMware session on
an XP Pro box that's running eleventy billion server processes. I'm
also going to try it with various string lengths and optimization
options.

Anyway, RH
code was 96% more readable and maintainable, so OP was Trolling.

Click to expand...

Again, no doubt.

Click to expand...

If you prefer RHs code I am astonished. But it takes all sorts I
support. it's not "bad code" by any stretch of the imagination.

As was pointed out earlier that kind of speed increase/decrease is
generally immaterial however, I know for a fact that the time saved in
debugging and reading it is worth its weight in gold if you can not
understand such a bog standard 2 lines then there are issues in your
understanding of C.

while(*u++=(*s=='.' ? '_' : *s))
s++;

It couldn't be easier.

this is easier
goto l2;
l1: ++s; ++u;
l2: *u=(*s=='.'? '_': *s);
if(*s) goto l1;

in 2 lines:
goto l2;
l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s) goto l1;

in one line with #==goto

#l2; l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s)#l1;

Philip Potter · Oct 9, 2007

Richard said:
Why would you post that?

Off Topic for a start and NOTHING to do with C.

The poster was quite correct in his improvements of RHs code. Regardless
of who or what he is.

Really? I see neither bug nor gross inefficiency in RH's original, which
was his main claim. His code posted is not grossly more efficient - it's
only a constant factor faster, not even a different big-O efficiency class.

Keith Thompson · Oct 9, 2007

¬a\\/b said:
In data Tue, 09 Oct 2007 02:39:06 +0200, Richard scrisse: [...]

while(*u++=(*s=='.' ? '_' : *s))
s++;

Click to expand...

It couldn't be easier.

Click to expand...

this is easier
goto l2;
l1: ++s; ++u;
l2: *u=(*s=='.'? '_': *s);
if(*s) goto l1;

in 2 lines:
goto l2;
l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s) goto l1;

in one line with #==goto

#l2; l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s)#l1;

Stripped of whitespace, compressed with gzip, and base64 encoded:

H4sIAIUzC0cAA1POMbJWyDG0UtDWLrYGEqVAnpGVglaprYZWsa2tnr1CPJBXrGmt
kJkGFNFUzjG05gIAN60ghDUAAAA=

Ahh, much better.

¬a\\/b · Oct 9, 2007

In data Mon, 8 Oct 2007 00:16:07 +0200 (CEST), Antoninus Twink
scrisse:

The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.

char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

this is a big waste of vertical spaces

¬a\\/b · Oct 9, 2007

In data Tue, 09 Oct 2007 09:31:52 +0200, ¬a\/b scrisse:

John Bode said:
In data Tue, 09 Oct 2007 02:39:06 +0200, Richard scrisse:

John Bode said:

John Bode wrote:

[...]

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

ROFL!

However, the optimizer can be playing tricks with you here.

Oh, no doubt. But that kind of reinforces the idea that writing code
in a clear, straightforward way (that is, clear to anyone who *isn't*
an expert in C) is better than trying to save screen real estate.

I'm rewriting the test harness and running it on my home box (Gentoo);
the box I ran it on earlier was RHEL 3 running in a VMware session on
an XP Pro box that's running eleventy billion server processes. I'm
also going to try it with various string lengths and optimization
options.

Anyway, RH
code was 96% more readable and maintainable, so OP was Trolling.

Again, no doubt.

Click to expand...

If you prefer RHs code I am astonished. But it takes all sorts I
support. it's not "bad code" by any stretch of the imagination.

As was pointed out earlier that kind of speed increase/decrease is
generally immaterial however, I know for a fact that the time saved in
debugging and reading it is worth its weight in gold if you can not
understand such a bog standard 2 lines then there are issues in your
understanding of C.

while(*u++=(*s=='.' ? '_' : *s))
s++;

Click to expand...

It couldn't be easier.

Click to expand...

this is easier
goto l2;
l1: ++s; ++u;
l2: *u=(*s=='.'? '_': *s);
if(*s) goto l1;

in 2 lines:
goto l2;
l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s) goto l1;

in one line with #==goto

#l2; l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s)#l1;

#.2
..0: B*i='_'; #.1;
..1: ++i,j; .2: al=*j; al=='.'#.0; *i=al; al#.1

¬a\\/b · Oct 9, 2007

In data Tue, 09 Oct 2007 10:06:04 +0200, ¬a\/b scrisse:

#.2
.0: B*i='_'; #.1;
.1: ++i,j; .2: al=*j; al=='.'#.0; *i=al; al#.1

there is jmp more

#.2
..0: B*i='_'
..1: ++i,j; .2: al=*j; al=='.'#.0; *i=al; al#.1

that should be something like
jmp short .2
..0: mov byte [esi], '.'
..1: inc esi
inc edi
..2: mov al, [edi]
cmp al, '.'
je .0
mov [esi], al
cmp al, 0
jne .1;

Antoninus Twink · Oct 9, 2007

I have not seen the fgetline() code, but if the function in question is
called only *once*, then OP "optimized" the *outer* loop, a professional
would go for the *inner* loop.

Your measurement, showed that OP's posted nonsense, an insignificant
micro-optimization.

The point was that Mr Heathfield's code was a micro-anti-optimization.
If making the code harder to read in exchange for a small increase in
speed is bad, how much worse is it to make the code harder to read by
implementing a convoluted two-pass algorithm that's more complex and
slower than the natural, idiomatic one?

Or, if someone makes the small decisions badly, does that give us any
faith that they'll do better on the big decisions?

santosh · Oct 9, 2007

¬a\/b said:
In data Tue, 09 Oct 2007 10:06:04 +0200, ¬a\/b scrisse:

there is jmp more

#.2
.0: B*i='_'
.1: ++i,j; .2: al=*j; al=='.'#.0; *i=al; al#.1

Your programs would be more readable if you decided to use either one of
l33t or brainf?uk.

To get you started:
<http://esoteric.voxelperfect.net/wiki/Brainf?ck>
<http://www.geocities.com/electrodruiduk/l33t.htm>

Richard Heathfield · Oct 9, 2007

Antoninus Twink said:

The point was that Mr Heathfield's code was a micro-anti-optimization.

No, Mr Heathfield's code was written in about a minute, and worked
perfectly first time, and is easy for even a very inexperienced C
programmer to understand. What's more, it is called once per program
invocation, and does its job in less than a microsecond. You can call it
what you like, but I call it a win.

If making the code harder to read in exchange for a small increase in
speed is bad,

Whether it is bad depends on the relative merits of performance and
clarity. I have already shown how I would have to use the program 24/7 for
almost half a century before your suggested change could save me so much
as a nanosecond (once the time cost of making that change is factored in),
and quite frankly I have better things to do with my life. So in this
case, the performance increase is meaningless, whereas the loss of clarity
is significant.

how much worse is it to make the code harder to read by
implementing a convoluted two-pass algorithm that's more complex and
slower than the natural, idiomatic one?

We have already demonstrated why "slower" is unimportant. If you seriously
think the original code is hard to read, then words fail me. It's
terribly, terribly simple C. It's astoundingly easy for most C people. I
simply cannot understand why you would find it difficult or complex.

Or, if someone makes the small decisions badly, does that give us any
faith that they'll do better on the big decisions?

We have different criteria. You appear to judge code by how "tight" it is.
I prefer to judge it by how readable and maintainable it is.

It is certainly true that I could have merged the copy and the replace
operation, but it is also true that doing so makes practically no
difference to the performance of the code. What you have called "gross
inefficiency" manages to outperform your code on at least one platform (as
reported elsethread), and doesn't perform significantly worse on others.

What's more, my code is easily understood and easily maintained by even
very inexperienced C programmers. For me, that's a key goal, because the
code I write is very often used and perhaps modified in environments over
which I have no control. So I like to keep things simple. If someone
discovers a way that they can shave 180 nanoseconds (that's 0.00000018
seconds!) off a function that is called once per program invocation, I'm
pleased for them, but that doesn't mean I have to incorporate their
changes into my code.

John Bode · Oct 9, 2007

The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings.

It could, except that on my system (Gentoo Linux, 2.6.22 kernel, gcc
4.1.2), it doesn't. In fact, it leads to overall *better* performance
on very long strings. I wrote a test harness that generates random
strings and calls both Richard's and Antonius' versions of
dot_to_underscore, as well as a "control" version (basically a copy of
Richard's, except that it copies characters to the target buffer
manually instead of using strcpy()).

I ran three rounds of tests. Each round worked on a 2 MB string
(2097152 bytes), and for each run of the program, each version of the
function was called 100 times (basically so I could get some usable
times). For the first round, I compiled the code without any
optimization. For the second round, I compiled with the -O1 flag.
For the third round, I compiled with the -O2 flag. I used gprof to
get the time spent in each function.

For each round, I ran the program 10 times, logging the times from
each, and computed the average.

So, calling each function 100 times on a 2 MB string, I got the
following results (average time over 10 runs):

Function RH AT JB
-------- -- -- --
No optimization 2.72 s 3.49 s 4.01 s
Level 1 optimization 1.33 s 1.93 s 1.74 s
Level 2 optimization 0.60 s 1.11 s 1.13 s

Hmmmm. Two things stand out. One, on my system, calling strcpy() is
*definitely* faster than copying each individual character in a loop
(as that is the only difference between RH and JB). Secondly, writing
code in a straightforward, "clunky" style seems to make it easier to
optimize.

Of course, these results are valid *only* for my particular system.
I'd be interested to see the results from different hardware/OS/
compiler combinations.

It is also written in a hard-to-read and clunky style.

char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;

}

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;

}

It's time to re-learn those two important lessons:

1. Terse code does not necessarily equate to faster code
2. The only way to know what version of code will be more efficient
is to profile all versions under consideration.

More bad code is written in the name of "efficiency" than anything
else.

Richard · Oct 9, 2007

John Bode said:
It could, except that on my system (Gentoo Linux, 2.6.22 kernel, gcc
4.1.2), it doesn't. In fact, it leads to overall *better* performance
on very long strings. I wrote a test harness that generates random

On very long strings yes. Why? because there is no comparison blocking
every substitution/transfer. Clearly the strcpy will be faster.

The number of "." will be a big input as well.

It's time to re-learn those two important lessons:

1. Terse code does not necessarily equate to faster code
2. The only way to know what version of code will be more efficient
is to profile all versions under consideration.

More bad code is written in the name of "efficiency" than anything
else.

There are other lessons to learn:

1) Anyone can engineer any results when they feel the need. 2 meg
strings? for a 4% increase? No mention of the data involved? Come off it.
2) The code above is not to do with pure speed at runtime

The code above is not "terse". It is bog standard C. Sorry.

It is most certainly not "bad code" - it is a damn site easier to look
at and maintain than RHs in my humber opinion.

2 lines of core code beats 15 any day of the week when the functionality
is so trivial. And it IS trivial. AND we are not taling 2k long lines
here either. Basic pointer dereference and assignments.

Antoninus Twink · Oct 9, 2007

Antoninus Twink said:

Whether it is bad depends on the relative merits of performance and
clarity. I have already shown how I would have to use the program 24/7 for
almost half a century before your suggested change could save me so much
as a nanosecond (once the time cost of making that change is factored in),
and quite frankly I have better things to do with my life. So in this
case, the performance increase is meaningless, whereas the loss of clarity
is significant.

But exactly the opposite is true - clarity is lost in *your* version, by
taking something simple and making a meal of it. The average programmer
looking at your code is going to do a double-take, wonder why on earth
that strcpy() is there when you immediately make another pass through
the string, and have to convince himself that no, there really isn't any
funny business going on.

Compare that to two lines of idiomatic code.

We have already demonstrated why "slower" is unimportant. If you seriously
think the original code is hard to read, then words fail me. It's
terribly, terribly simple C. It's astoundingly easy for most C people. I
simply cannot understand why you would find it difficult or complex.

Yes, it's terribly, terribly simple. Almost babyish even. I did not mean
that your *code* is complex, but that the *algorithm* is complex. And
you will say, "But it's a simple algorithm!" Right, it isn't complex by
any objective standard of complexity, but it's *more complex than it
needs to be* - why swap a simple single-pass algorithm for a 2-pass
algorithm?

And if you make simple things over-complicated, we might not
unreasonably suspect that you might make complicated things into a
complete mess.

Fibonacci	0	May 13, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
C language. work with text	3	Dec 10, 2021
code review	26	Feb 6, 2004
Can't solve problems! please Help	0	Sep 26, 2022
compressing charatcers	35	Apr 2, 2014
Strange bug	65	Nov 19, 2010
K&R exercise 5-5	10	Feb 20, 2007

Bug/Gross InEfficiency in HeathField's fgetline program

santosh

Richard Heathfield

santosh

Tor Rustad

John Bode

CBFalconer

Richard

Old Wolf

¬a\\/b

Philip Potter

Keith Thompson

¬a\\/b

¬a\\/b

¬a\\/b

Antoninus Twink

santosh

Richard Heathfield

John Bode

Richard

Antoninus Twink

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads