Bug/Gross InEfficiency in HeathField's fgetline program

  • Thread starter Antoninus Twink
  • Start date
D

dbtid

On very long strings yes. Why? because there is no comparison blocking
every substitution/transfer. Clearly the strcpy will be faster.

Wasn't that the point of all the attacks? That it *wouldn't* be faster?
If that wasn't the point, why did you EVER open your mouth (figuratively
speaking)?

If the test isn't good enough for you, why don't you make a comparison in
which 50% of the bytes are '.' and see what happens? Or why not do it for
0%-100% and then post your results? That way you can be sure no one's
being fooled.
The number of "." will be a big input as well.

ITYM '.', not ".". Any proficient C programmer would know this.
Amen.

Amen.


There are other lessons to learn:

1) Anyone can engineer any results when they feel the need. 2 meg
strings? for a 4% increase? No mention of the data involved? Come off it.

Well, he said it was random. How is that an attempt to engineer results?
2) The code above is not to do with pure speed at runtime

The code above is not "terse". It is bog standard C. Sorry.

Why does the code have to be terse? I'm not sure what "bog standard C"
is. I've looked, and I can't find any IEC/ISO "bog standard" versions of
C. (If you don't see the humour in that, you need more help than just in
learning C and how to communicate on Usenet.)
It is most certainly not "bad code" - it is a damn site easier to look
at and maintain than RHs in my humber opinion.

How is it easier? Because there's less of it? I'm guessing you've never
read any of the IOCCC entries.

And, would you *PLEASE* get a spell checker, or a dictionary, or a 10
year-old to check your spelling and grammar??

And as for your "humble opinion" (correction mine), you are, from what
I've observed, a far cry from humble.
You show absolutely no humility at all in any of your postings. FYI, a
critical spirit is NOT a humble spirit. For some reason, you've got an
axe to grind.

Almost all of your postings on this topic that I've read are critical of
Heathfield for the sake of being critical.
2 lines of core code beats 15 any day of the week when the functionality
is so trivial. And it IS trivial. AND we are not taling 2k long lines
here either. Basic pointer dereference and assignments.

As a generic comment, that's pure bullshit. You could write several loops
on two lines, which could easily be made "trivial," and they'd be a damn
sight more difficult to read than those same loops put on different lines.

You would do the people/computers/aliens who read c.l.c a great service if
you would find something more worthwhile to do with your time than make
all these ridiculous assertions.

Several things come to mind, but you'd probably take offense, so I won't
post them here.
 
T

Tor Rustad

Richard wrote:

[...]
The shorter more concise version would have got through any code review
long before the long winded one on any project I have worked on.

OP's version, would *never* have passed through a code review by me at
least. If I locate such *student code* in safety-critical and/or
security sensitive SW, the subcontractor has received their last
contract from "us", unless the junior programmer is taken off the
project and his code is rewritten.

In one funny case, the sub-contractor hired me to do their re-write, I
did pass it on code review thereafter! :)
 
D

dbtid

He most certainly is not.

I don't know how long you've been writing C, but I've been doing it for 20
years now.

If I were looking at that code with an eye to making it run as fast as
possible, I'd probably benchmark it written a couple of different ways.
I've done embedded systems for most of that 20 years, and have had
occasion to lock sections of code down in cache so that I could meet
performance metrics. I have no idea what your level of expertise is, but
from your arguments and the comments you make, I suspect it's not HPC.
The shorter more concise version would have got through any code review
long before the long winded one on any project I have worked on.

Not the case on any project I've been on. The use of the comma operator,
for one, would be seen as unwise, as well as assignment from the result of
the trinary operator.

You like terse code? Great. Use it. Then give it to a co-op or intern
or new grad and see how well he can read it. Odds are, they won't
understand that. I know I wouldn't hire you if I saw samples of your code
that looked like this everywhere.

From memory, I think the loop you posted was something like:
while (*u++ = (*s == '.' ? '_' : *s))
s++;

While it's short, it has at least 8 pieces of semantic information all
stuffed together. It's simply too much for the average C programmer to
take in at a glance.

On the other hand, the original code that the twink posted which he
claimed was from Heathfield, was easy to read because the meaning wasn't
all stuffed together.

Code the way you want, but please stop making these asinine assertions
about how idiomatic conciseness is the only way to write good code.
This is getting ridiculous.

You and twink pushed WAY past ridiculous a great number of posts in the
past.
 
T

Tor Rustad

kuyper said:
Richard wrote:
...

I have interviewed several dozen people for C programming positions
over the past 15 years. I've given every single one of them a simple
program to understand and explain and suggest improvements for. The
heart of that program was the following loop:

while(*p++ = *q++);

Only about half of them could even tell me correctly what that loop
does; not a single one has ever correctly explained how it does it.

Recently I was involved in hiring a security officer with C programming
skills. Those who passed the initial screening, had at least a master
degree, but their experience varied a lot.

I ended up recommending the only one, with 0 work experience, who
admitted he didn't knew C well. The seniors, failed big time
implementing strncpy() on the blackboard. Very embarrassing.
 
J

John Bode

On very long strings yes. Why? because there is no comparison blocking
every substitution/transfer. Clearly the strcpy will be faster.

So you agree that the OP's argument holds no merit. After all, the
title of the thread starts with "Bug/Gross InEfficiency...", when the
truth is that, if anything, Heathfield's code winds up being more
efficient than the suggested alternative.
The number of "." will be a big input as well.

It's on the order of 30,000 for the 2 MB string, or 1.4%. That's
randomly selecting from the printable character set.
There are other lessons to learn:

1) Anyone can engineer any results when they feel the need. 2 meg
strings? for a 4% increase? No mention of the data involved? Come off it.

Strings were randomly generated from the printable character set. And
2 meg gave me usable numbers from gprof.
2) The code above is not to do with pure speed at runtime

Except that the OP's argument had everything to do with speed at
runtime; he was ranting that because RH made three passes through the
string, his code could lead to unnecessarily bad performance. I
simply ran a test to see if that would hold on my system. It
doesn't.

In fact, I tweaked Richard's code a bit and used strchr() to search
the string for '.' characters:

char *dot_to_underscore_RH (const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(u)
{
u = strchr(u, '.');
if (u)
{
*u = '_';
}
}
}
return t;
}

No doubt you would consider it ugly, but it results in a *huge*
performance boost on very long strings: less than a third of the time
it takes AT's code.
The code above is not "terse". It is bog standard C. Sorry.

It's both. Yes, it's perfectly idiomatic C, but it's also
unnecessarily terse.
It is most certainly not "bad code" - it is a damn site easier to look
at and maintain than RHs in my humber opinion.

Which is another data point in favor of my theory that C causes brain
damage. Look, 10 years ago I made the exact same arguments as you
are, that if you know C then this shit ought to be obvious. Then I
grew up and realized that not everyone who has to read and maintain C
code inhaled Harbison & Steele the way I had.
2 lines of core code beats 15 any day of the week when the functionality
is so trivial. And it IS trivial. AND we are not taling 2k long lines
here either. Basic pointer dereference and assignments.

But what about non-trivial functionality? What about cases where
performance really does matter? By saving some vertical space, AT
introduced some runtime inefficiency. Is it still reasonable to argue
that screen real estate should drive implementation decisions?
 
R

Richard Heathfield

Tor Rustad said:

I ended up recommending the only one, with 0 work experience, who
admitted he didn't knew C well. The seniors, failed big time
implementing strncpy() on the blackboard. Very embarrassing.


Well, I'm game. Is this a blackboard? Why, yes, it is (although it's
actually white, but never mind).

Okay, it's an interview, so I'm not allowed to look stuff up. So, off the
top of my head, strncpy copies no more than n characters from s to t,
stopping at a null terminator if present, and zero-padding t. It then
returns t. I can't actually remember whether n is size_t or int. (It ought
to be size_t, of course, but then so ought the n in fgets.) So I'll risk
embarrassment by plumping for size_t.

#include <stddef.h>

char *strncpy(char *t, const char *s, size_t n)
{
char *u = t;
while(n > 0 && *s != '\0')
{
*t++ = *s++;
--n;
}
while(n-- > 0)
{
*t++ = '\0';
}
return u;
}

How did I do? Should I start blushing yet?
 
¬

¬a\\/b

In data Tue, 09 Oct 2007 14:17:59 +0530, santosh scrisse:
#.2
..e: a=0; jmp exit /* here exit erron
..0: B*i='_'
..1: ++i; jz .e; ++j; jz .e; .2: al=*j; al=='.'#.0; *i=al; al#.1
Your programs would be more readable if you decided to use either one of
l33t or brainf?uk.

i think that a routine has to deal with every input possible and
impossible: so have to detect all, and report its result

so no one have detect if the array is not a string

and so the 'brainfuk' language version is better that all your ones
all togheter :)
 
M

Malcolm McLean

Antoninus Twink said:
It's the sort of thing that attracted me to C in the first place. I
suspect I'm not alone.
To know the secret codes is a powerful motive. But it is the attiude of a
hacker, not a software engineer.
 
T

Tor Rustad

Richard said:
Tor Rustad said:




Well, I'm game. Is this a blackboard? Why, yes, it is (although it's
actually white, but never mind).

The actual one, was white too. :)
Okay, it's an interview, so I'm not allowed to look stuff up. So, off the
top of my head, strncpy copies no more than n characters from s to t,
stopping at a null terminator if present, and zero-padding t. It then
returns t. I can't actually remember whether n is size_t or int. (It ought
to be size_t, of course, but then so ought the n in fgets.) So I'll risk
embarrassment by plumping for size_t.

First task indeed, would be to explain how this standard function works.
Knowing about the zero-padding, would give you a plus, but more
important would be if the candidate said something about the security
related usage pitfalls.

#include <stddef.h>

char *strncpy(char *t, const char *s, size_t n)
{
char *u = t;
while(n > 0 && *s != '\0')
{
*t++ = *s++;
--n;
}
while(n-- > 0)
{
*t++ = '\0';
}
return u;
}

How did I do? Should I start blushing yet?

Well done! :) Obviously dealing with a senior C programmer here. I
would myself, rather have written aka

char *strncpy(char *dest, const char *src, size_t n)
{
char *d = dest;

[...]

return dest;

to make the code more readable, but by the first sight, I can't spot any
mistake in your code.

OTOH, you didn't provide pre/post-conditions via e.g. assert for the UB
cases. My follow-up question, would be to see some assert's checks for UB.

However, your C skills are sufficient, so we would rapidly move over to
other technical and relevant topics. :)
 
R

Richard Heathfield

Tor Rustad said:

Knowing about the zero-padding, would give you a plus, but more
important would be if the candidate said something about the security
related usage pitfalls.

There aren't any - strncpy is perfectly secure if used properly. If we're
allowed to use strncpy improperly, I'll get my coat.


[...] I would myself, rather have written aka

char *strncpy(char *dest, const char *src, size_t n)

/nod - fair enough, although the convention of t = target, s = source is
relatively well-established. [Checks K&R2.] Except in K&R2. [Checks
Standard.] And except in the Standard. [Checks H&S4.] And except in H&S4.
[Checks D&D5.] And except in D&D5. In fact, it doesn't seem to be even
remotely well-established in the literature. Odd. I've seen it around
quite a lot. Maybe it's just me that uses that convention - but in *my*
code it's very we... hmmm... [Checks own code.] Er, no, in fact it's not
even all that common in my own code. But I do use it sometimes. Like, er,
for example, er, er... here!

OTOH, you didn't provide pre/post-conditions via e.g. assert for the UB
cases. My follow-up question, would be to see some assert's checks for
UB.

Fair comment again. Clearly both pointers can be checked to ensure that
they are not null pointers. Whether n should be asserted to be > 0 is
arguable.

<snip>
 
P

Peter Nilsson

Richard Heathfield said:
[snip strncpy discussion]
OTOH, you didn't provide pre/post-conditions via e.g. assert
for the UB cases. My follow-up question, would be to see some
assert's checks for UB.

Fair comment again. Clearly both pointers can be checked to
ensure that they are not null pointers.

You could even test for overlapping buffers by testing each
destination byte prior to writing, but I personally wouldn't
bother with any assert, let alone that one.
Whether n should be asserted to be > 0 is arguable.

Not really, since n == 0 is explicitly allowed.
 
P

Peter Nilsson

Richard Heathfield said:
Tor Rustad said:

There aren't any -

That alone would be enough to put most people out of the
running in my book.
strncpy is perfectly secure if used properly.

This wouldn't help.

Risk assessment is about what might go wrong. More important
is how you mitigate identified risks. But you're not in a
good position to do that if you can't even identify them!

My next questions would be: "Can you think of any ways it
might be used improperly? What steps would you take to avoid
using it improperly?" My tone and body language would no doubt
give away that the previous response had placed the interviewee
on a precipice, assuming I hand't already thanked them for
their time. ;)
 
R

Richard Heathfield

Peter Nilsson said:
That alone would be enough to put most people out of the
running in my book.

That's why I made the remark about my coat.
This wouldn't help.

Despite being true. Truth often goes down badly at interviews. I once blew
an interview because I corrected the question (which was in the form of a
written program, in which they had voided main - had that been what they'd
wanted me to spot, I'd have been fine, but it turns out they thought main
really did return void).
Risk assessment is about what might go wrong. More important
is how you mitigate identified risks. But you're not in a
good position to do that if you can't even identify them!

The risk is not in strncpy. The risk is in picking the wrong programmer for
the job. If you pick someone who thinks otherwise, well, that's a risk
too.
My next questions would be: "Can you think of any ways it
might be used improperly?

And I'd answer "Yes, of course. I can think of a way a child's toy brick
can be used improperly, too. I don't think we're on the same page here."
What steps would you take to avoid using it improperly?"

My reply to this one would be "I wouldn't. Instead, I'd take steps to use
it properly, which means (a) never using it at all unless it really is the
right function for the job, which does happen but is vanishingly rare; and
(b) ensuring that there is sufficient storage to receive *all* the data I
intend to copy." I would not bother to mention anything about strings,
since I don't particularly think of strncpy as being a string function.
My tone and body language would no doubt
give away that the previous response had placed the interviewee
on a precipice,

My tone and body language would perhaps indicate that I was on the point of
setting off for home...
assuming I hand't already thanked them for their time. ;)

....and crossing the company off my list for wasting my time. :)
 
M

Mark McIntyre

Your refusal to comment is a comment.

"You might well think that; I couldn't possibly comment"

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
T

Tor Rustad

Richard said:
Tor Rustad said:



There aren't any - strncpy is perfectly secure if used properly.

Oh dear, remember that the position in question, was for a *security
officer*.

Yesterday, I checked the latest update for OpenSSL, it included a
security fix for an off-by-one bug. Would you bet your life on, this
wasn't strncpy related?

To me, "perfectly secure" doesn't exist, unless we do mathematics, and
*define* ourself outside the real world. :)
If we're allowed to use strncpy improperly,

That isn't really the issue, you have got it all backwards, as a senior
security officer, you are supposed to think about everything that can go
*wrong*. Getting risk down to acceptable level, require analysis and
countermeasures, thinking about all sorts of failures.

Protecting against buffer-overflow and off-by-one bugs, is really kids
stuff, nevertheless needs to be done. No point in building a fortress on
quick-sand.

The main concern on high-value systems, is often to protect against the
insider threat, which is rather a difficult job. Those insiders, not
only have physical access, but may even be security/software/hardware
experts, capable of all kinds of attack.
I'll get my coat.

Doesn't matter, you blew it. I rather prefer to train a smart junior,
than to change the mindset and attitude of a senior. :)
Fair comment again. Clearly both pointers can be checked to ensure that
they are not null pointers.

What about the non-trivial case, where buffers overlap?
Whether n should be asserted to be > 0 is arguable.

Remember, n = 0 isn't UB. What would be the pro/cons of such a check?
 
R

Richard Heathfield

Tor Rustad said:
Oh dear, remember that the position in question, was for a *security
officer*.

It was? Oh, in that case I didn't apply. Who wants to prowl around the
grounds at 3am shining a torch into random darknesses and occasionally
getting smacked on the back of the head with a crowbar? Not my cup of tea
at all.
Yesterday, I checked the latest update for OpenSSL, it included a
security fix for an off-by-one bug. Would you bet your life on, this
wasn't strncpy related?

I have no idea. But if it were, then clearly strncpy wasn't used properly.
:)
Doesn't matter, you blew it.

Well, it was blown, anyway. Let's not point fingers.
I rather prefer to train a smart junior,
than to change the mindset and attitude of a senior. :)

Yes, I can understand that - but on the other hand, obviously I don't think
there's anything wrong with my mindset and attitude, because otherwise I'd
change it myself. And *I* would change it. It's not your job to change my
mindset.
What about the non-trivial case, where buffers overlap?

I'm not entirely sure this can be done in standard C, at least not the
obvious way, although I'd be glad to be proved wrong. Am I not right in
saying that you can't "well-definedly" compare two pointers relationally
unless they point into the same object? If so, you'd have to compare using
s + i == t (or s + i != t, depending on which way you're going) for all i
< n, and t + i == s for all i < n as well. Not funny. But yes, it could be
done.

Remember, n = 0 isn't UB. What would be the pro/cons of such a check?

Chances are good that it's a bug in the caller, but it depends on whether
you view n = 0 as a legitimate case or not. Some do, and some don't.
That's why it's arguable. (The Standard /does/ view it as a legitimate
case, of course.)
 
M

Mark McIntyre

Tor Rustad said:


Yes, I can understand that - but on the other hand, obviously I don't think
there's anything wrong with my mindset and attitude, because otherwise I'd
change it myself. And *I* would change it. It's not your job to change my
mindset.

Unfortunately thats *precisely* the sort of arrogance that one can't
afford. If someone isn't open to having their mindset changed by
others, then they're probably useless as an employee. Imagine if you
hired someone who refused point blank to change to your in-house
coding standard, or to use your in-house libraries....

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
R

Richard Heathfield

Mark McIntyre said:
Unfortunately thats *precisely* the sort of arrogance that one can't
afford.

Well, it's not actually arrogance. The point is that my mind is *my* mind,
not *your* mind. If you want to hire it, fine. But *changing* it is my
prerogative, just as changing your mind is your prerogative.
If someone isn't open to having their mindset changed by
others, then they're probably useless as an employee. Imagine if you
hired someone who refused point blank to change to your in-house
coding standard, or to use your in-house libraries....

The trick is to review the coding standard during your first day, and come
up with a long list of proposed changes (which, naturally, you will be
prepared to justify). Once you've won that battle, you can start in on
reviewing the in-house libraries. :)
 
M

Mark McIntyre

Mark McIntyre said:


Well, it's not actually arrogance.

Personally I consider it the height of arrogance to believe
unequivocally that you cannot possibly be wrong.
The point is that my mind is *my* mind,
not *your* mind. If you want to hire it, fine. But *changing* it is my
prerogative, just as changing your mind is your prerogative.

Ok, if you prefer - I prefer to train a smart junior who's not afraid
to admit he has stuff to learn, than to try to persuade a senior, who
thinks he knows it all, to accept that he doesn't. .
The trick is to review the coding standard during your first day, and come
up with a long list of proposed changes

Ah - the old "show what a pompous arse you are" trick. :)

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Fibonacci 0
Adding adressing of IPv6 to program 1
C language. work with text 3
code review 26
Can't solve problems! please Help 0
compressing charatcers 35
Strange bug 65
K&R exercise 5-5 10

Members online

Forum statistics

Threads
473,772
Messages
2,569,589
Members
45,100
Latest member
MelodeeFaj
Top