Bug/Gross InEfficiency in HeathField's fgetline program

Richard · Oct 8, 2007

Richard Heathfield said:
Antoninus Twink said:

It appears to be from my emgen utility, in fact.

If it becomes a problem, I'll fix it. So far, it has not been a
problem.

Why write it to be slower for the same functionality. It's basic level 1
pointer stuff.

A matter of opinion. Which bit did you find hard to read?

It covers about 20 lines - this is harder to read when the other is
about 4 and easy to follow. In my experience excessive white space is rarely welcome in
the real world especially when using a debugger.

It is not obvious to me that this code correctly replaces the code I
wrote.

Which part do you think doesn't "correctly" replace the code? Always
better to highlight the errors since we are all prone to making them. I
guess there is some feature in your code that you are alluding too that
is not obvious to others.

Richard · Oct 8, 2007

Thad Smith said:
What happens when malloc returns a null pointer?

It returns null obviously , why?

Antoninus Twink · Oct 8, 2007

He hasn't said that this is what he believes. He is stating that your
code does not *obviously* replace his correctly. It is up to you to
prove it does, if you're going to say his is defective and you have come
up with a replacement.

On the contrary: the code was four lines and used common C idioms. If it
isn't completely obvious to someone (or at the very least if they can't
mentally check in 10 seconds) what it does, then I don't believe they
know the language well enough to write it professionally.

(And if in turn you read my words to the letter, you'll notice that I
make no claim that Mr HeathField falls into this class of people: it is
my belief that he is being deliberately obtuse.)

Antoninus Twink · Oct 8, 2007

I would move the ++ part

,----
| while(*u++=(*s=='.' ? '_' : *s))
| s++;
`----

But yes, much nicer, easier to read and understand and possibly
faster. And hopefully working :-;

That's indeed easier to read, though it's always satisfying to write a
loop with no body

Richard Heathfield · Oct 8, 2007

Richard said:

Which part do you think doesn't "correctly" replace the code?

That's not what I said.

Richard · Oct 8, 2007

Philip Potter said:
He hasn't said that this is what he believes. He is stating that your
code does not *obviously* replace his correctly. It is up to you to

Don't be ridiculous. It was quite clear what RH meant. The shorter code
is FAR more obvious to me anyway - especially since it removes 2
unnecessary string operations which may or may not have caused hard to
spot behaviour. The shorter version is far, far more obvious in every
way. This is, of course, IMO abnd I am sure that a nOOb who cant read
the "?" operator and has no idea about post increment and pointer
dereferncing might have difficulties with the compact nature of the
second version. Personally I think it is easy enough to read at first
glance and the reduction in library calls makes it *easier* to
understand. having said that, I haven't tested or scrutinised the code
to ensure it reproduces in all cases exactly what the original code
produces.

prove it does, if you're going to say his is defective and you have
come up with a replacement.

No one said it was deceptive. heatjfield suggested it didn't do the same
thing "obviously". The original was, however long winded, overly white
spaced and unnecessarily inefficient IMO. Real code review if you
like. No one needs 20 lines for such a simple piece of code. This is C :
elegance and efficiency is everything. I suspect that it was overly
wordy for a reason since I would be surprised to see production quality
code like that.

Richard · Oct 8, 2007

Joachim Schmitz said:
20 lines don't fit your screen???

What a ridiculous argument.

It doesn't fit my source window in gdb for a start.

Why 20 lines when it can be 4 perfectly eligible, efficient lines?

Don't defend code just because of the source. The second version is
superior for production level code in at least two ways:

1) screen real estate (this IS important in the real world).
2) less library calls. less to worry about IMO.

Richard · Oct 8, 2007

Richard Heathfield said:
Philip Potter said:

I always knew you could read, Philip. Some other people, I'm not so
sure about.

It was perfectly obvious what you meant. Word games don't even begin to
cover it.

Richard · Oct 8, 2007

Richard Heathfield said:
Richard said:

That's not what I said.

Yes it is. If its not "obvious" then there is an issue. It is obvious to
me. Which part is not obvious to you?

Richard Heathfield · Oct 8, 2007

Richard said:

Don't be ridiculous. It was quite clear what RH meant.

Well, I agree, but you appear to have misunderstood it nonetheless. I said
*precisely* what I meant to say, which is that it was not obvious to me
that this code correctly replaces the code I wrote.

No one said it was deceptive.

Nor did anyone say it was defective.

heatjfield suggested it didn't do the same thing "obviously".

Well, I said it wasn't obvious to me that it does the same thing. Your
paraphrase of what I said is susceptible to interpretations that cannot
reasonably be drawn from my original phrasing.

<snip>

santosh · Oct 8, 2007

Richard said:
[ ... ] heatjfield suggested [ ... ]

What is it with people mangling Richard's surname?

Richard Heathfield · Oct 8, 2007

Richard said:

Yes it is.

No, it isn't. Read it again.

If its not "obvious" then there is an issue.

Yes. The issue is that the suggested replacement code is overly terse and
difficult to read.

It is obvious to me. Which part is not obvious to you?

Not least the need to compress functionality into the fewest possible
source characters at the expense of readability and maintainability. I
recognise that a terse style is considered "cool" by some, but in my
experience it is merely expensive.

I still haven't bothered to attempt to /read/ the suggested replacement
code, partly since its style discourages reading, and partly because of
its source. If, as I suspect, the OP is just a sock-puppet for the guy who
threatened to break my nose (see elsethread), I'm not terribly interested
in working out whether the functionality of his code does or does not
accurately reflect that of my own.

santosh · Oct 8, 2007

Richard wrote:

Why 20 lines when it can be 4 perfectly eligible, efficient lines?

Don't defend code just because of the source. The second version is
superior for production level code in at least two ways:

1) screen real estate (this IS important in the real world).
2) less library calls. less to worry about IMO.

As far as the second point applies to this specific example, it could be
possible that the strcpy implementation is more efficient than a manual
coded loop. This could become noticeable for large string numbers or
sizes.

Minor point, but since this whole thread is nit-picking, I though I'd
mention it.

Richard Heathfield · Oct 8, 2007

santosh said:

Richard said:
Richard said:

[ ... ] heatjfield suggested [ ... ]

Click to expand...

What is it with people mangling Richard's surname?

It's sheer ignorance. Getting someone's name right is a basic mark of
respect, which is why I *always* take care with names - I hate getting
them wrong, and have done so only rarely. "Kylheku" and "Dijkstra" are two
that I've got wrong in the past, but I always get them right now.

If someone continually gets a name wrong, it's a reasonable sign that they
consider that person to be not worth the bother of taking trouble for.
Richard Riley's inability or unwillingness to take out a second or two to
get my name right is therefore suggestive. And if he is so short of time
and care when composing Usenet articles that he *doesn't even have time to
get my name right*, I see no reason why I or anyone should accord his
hurried, careless views any weight. Furthermore, I would venture to
suggest that there *may* be a correlation between those who write hurried,
careless Usenet articles and those who write hurried, careless C.

And if someone habitually gets a particular name wrong in precisely the
same way that a violence-threatening troll habitually does (not Richard
Riley, I hasten to add, but this Antoninus Twink character that everyone
is treating so seriously), then it's hard to avoid the conclusion that
we're dealing with a sock puppet.

Joachim Schmitz · Oct 8, 2007

Richard said:
What a ridiculous argument.

Indeed. Not mine though but Antonius'...
It is definitly easier to read. Yes it could have been written more compact,
I esp. disliked the "return t;" in 2 lines and with a different brace style
one could save another couple of lines and still keep it simple.

It doesn't fit my source window in gdb for a start.

esp. in a debugger it is much easier to follow the flow if not everything is
done in one line

Why 20 lines when it can be 4 perfectly eligible, efficient lines?

see above, because it is easier to debug.
For that reason I liked your improvement to Antonius' version (moving the ++
part into the otherwise empty body of the loop).

Don't defend code just because of the source.

?? You mean RH being the source and I defend it because of that? Or what
else are youn trying to say here?

The second version is
superior for production level code in at least two ways:

1) screen real estate (this IS important in the real world).
2) less library calls. less to worry about IMO.

We're talking about one call to strcpy, which in a decent implementation is
lightning fast and higly optimized, so saving it doesn't really save much.

Bye, Jojo

Joachim Schmitz · Oct 8, 2007

Antoninus Twink said:
That's indeed easier to read, though it's always satisfying to write a
loop with no body

Try to resist the temptation, it's hard, I know...

Bye, Jojo

santosh · Oct 8, 2007

Richard said:
santosh said:

Richard said:

[ ... ] heatjfield suggested [ ... ]

Click to expand...

What is it with people mangling Richard's surname?

Click to expand...

It's sheer ignorance. Getting someone's name right is a basic mark of
respect, which is why I *always* take care with names - I hate getting
them wrong, and have done so only rarely. "Kylheku" and "Dijkstra" are
two that I've got wrong in the past, but I always get them right now.

If someone continually gets a name wrong, it's a reasonable sign that
they consider that person to be not worth the bother of taking trouble
for. Richard Riley's inability or unwillingness to take out a second
or two to get my name right is therefore suggestive. And if he is so
short of time and care when composing Usenet articles that he *doesn't
even have time to get my name right*, I see no reason why I or anyone
should accord his hurried, careless views any weight. Furthermore, I
would venture to suggest that there *may* be a correlation between
those who write hurried, careless Usenet articles and those who write
hurried, careless C.

And if someone habitually gets a particular name wrong in precisely
the same way that a violence-threatening troll habitually does (not
Richard Riley, I hasten to add, but this Antoninus Twink character
that everyone is treating so seriously), then it's hard to avoid the
conclusion that we're dealing with a sock puppet.

Regarding your last paragraph, there is strong evidence that
this "Antoninus Twink" and "Paul", (having an email beginning
with "paulcr"), who posted a few months ago are the same.

<http://groups.google.com/group/comp.lang.c/msg/f2e171fc9691cd7d?dmode=source>
<http://groups.google.com/group/comp.lang.c/msg/b571a9a155facb48?dmode=source>

As you can see, the "User-Agent" and "NNTP-Posting-Host" fields in the
header are identical. Further the timezone is also the same.

Whether the "Paul" in the second message above is indeed the one who
threatened you is not so easily verifiable, but it seems very likely.

Antoninus Twink · Oct 8, 2007

Not least the need to compress functionality into the fewest possible
source characters at the expense of readability and maintainability. I
recognise that a terse style is considered "cool" by some, but in my
experience it is merely expensive.

Wading through 20 lines of code that do a 4-line job in a roundabout way
can also be expensive.

I still haven't bothered to attempt to /read/ the suggested replacement
code, partly since its style discourages reading, and partly because of
its source. If, as I suspect, the OP is just a sock-puppet for the guy who
threatened to break my nose (see elsethread), I'm not terribly interested
in working out whether the functionality of his code does or does not
accurately reflect that of my own.

It's a pair of declarations, a simple return statement, and two lines of
moderately dense but completely idiomatic C. It's not like you need to
set aside a rainy afternoon to dedicate to reading it.

Actually I suspect that you've read it closely, because you'd like
nothing more than to humiliate me by pointing out a bug in it. As it's
manifestly correct, you've turned to word games instead of admitting
that your own code might, just might, admit some improvement.

And who on earth has said anything about breaking your nose? It sounds
to me like you're suffering from some form of paranoia.

santosh · Oct 8, 2007

Antoninus said:
Wading through 20 lines of code that do a 4-line job in a roundabout
way can also be expensive.

And who on earth has said anything about breaking your nose? It sounds
to me like you're suffering from some form of paranoia.

The email address of the poster who wrote this threatening message:
<http://groups.google.com/group/alt.comp.lang.learn.c-c++/msg/82c5c4b2e59984e0?dmode=source>

And this message:
<http://groups.google.com/group/comp.lang.c/msg/b571a9a155facb48?dmode=source
&utoken=0qjyGSsAAAAoZoYpti9uhwqsYFEYUqETYYZ-k6tkIFJmSBA4M7hURISd8ZKZA3rNsODuE9WiCO4>

begins identically.

Your messages in this thread share with the second message linked above,
identical "User-Agent", "NNTP-Posting-Host" headers as well as the
timezone.

A is related to B.
B is realted to C.

So

A is related to C.

Richard Heathfield · Oct 8, 2007

santosh said:

Richard wrote:

As far as the second point applies to this specific example, it could be
possible that the strcpy implementation is more efficient than a manual
coded loop. This could become noticeable for large string numbers or
sizes.

Yes, it's certainly possible. So is the reverse, of course. The only proper
course is to measure, realising that the measurement will be specific to a
particular implementation on a specific machine.

I have measured the performance of the code - my code - on an Athlon 1.4
(no slouch, but hardly a state-of-the-art mean machine) under gcc 2.95.3,
using an input file over 10 Megabytes in size (a typical real world input
would be a handful of kilobytes). Exact input file size: 11057780 bytes.
Number of lines: 120,000 (compared to a typical real world input of
perhaps a few dozen, or maybe three or four thousand for a fairly large
system).

The profiler reports that the program took 0.1 seconds to run (on inputs
that are orders of magnitude larger than would be expected in production).
The code whose performance is in question is called ONCE, by the way, and
the profiler (which claims to measure in microseconds) reports that the
function takes zero time to run. Obviously that can't be literally true,
but it's certainly too small for my gprof implementation to measure. It
might be reasonably argued that it takes almost a microsecond.

The purpose of the program is to take as input a list of error messages and
identifiers, and convert these into a .h file with #defines for the error
identifiers, and a .c file with a function that converts a number into an
error message.

It's a programmer's tool, and requires as input a file that is used by a
programmer to store an intelligent identifier/message pair. To edit such a
file, for a superhuman programmer like Chris Torek, might take as little
as - what, five seconds? (Wow, watch those fingers fly) And the next step
would be to run the program (perhaps automatically, on saving the file).
If this superhuman programmer did nothing but updates to the input file
all day every day, he would cause 86400/5 = 17280 program runs per day. If
the file size is as large as in my test (deeply unlikely in the real
world, but just about possible for a really, really, really large
project), the total program time taken would be a little less than half an
hour (29 minutes 28 seconds, in fact) - remember that this is for over
seventeen THOUSAND runs.

If, for the sake of argument, the OP's code is correct and if it reduces
the cost of converting dots to underscores from 1 microsecond to *zero*,
the total time saving per day will be 0.01728 seconds. Over a thousand
years of 24hrs/day of running this program 17280 times a day, the total
time saved will be about an hour and three quarters.

Compare this to the time spent on the discussion so far.

Fibonacci	0	May 13, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
C language. work with text	3	Dec 10, 2021
code review	26	Feb 6, 2004
Can't solve problems! please Help	0	Sep 26, 2022
compressing charatcers	35	Apr 2, 2014
Strange bug	65	Nov 19, 2010
K&R exercise 5-5	10	Feb 20, 2007

Bug/Gross InEfficiency in HeathField's fgetline program

Richard

Richard

Antoninus Twink

Antoninus Twink

Richard Heathfield

Richard

Richard

Richard

Richard

Richard Heathfield

santosh

Richard Heathfield

santosh

Richard Heathfield

Joachim Schmitz

Joachim Schmitz

santosh

Antoninus Twink

santosh

Richard Heathfield

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads