Bug/Gross InEfficiency in HeathField's fgetline program

  • Thread starter Antoninus Twink
  • Start date
R

Richard Heathfield

Tor Rustad said:
Richard said:
Tor Rustad said:
[...]
You did not answer the question.

It was a loaded question, so I engaged the safety catch.

I won this on a bluff then, they didn't misuse strncpy. :)

My answer was: "If they've misused strncpy, I would consider them to be
people capable of misusing strncpy." This is a statement of the form "if A
is true, then B is true", which doesn't enable us to deduce anything about
the state of B if A is false, so I don't quite see how you claim to have
"won" anything. Still, if *you* think you've won, that's fine by me.
 
¬

¬a\\/b

In data Mon, 8 Oct 2007 00:16:07 +0200 (CEST), Antoninus Twink
scrisse:
The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.

char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);

this goes in seg fault here if s==0
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;

idem than above
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}

/* 0 for error */
/* instring substitution */
char* dot_to_underscore(char *s)
{char *t=s;
l1:;
if(t==0) return 0;
if(*t=='.') *t='_';
if(*t) {++t; goto l1;}
return s;
}

this not
 
J

Joachim Schmitz

¬a\/b said:
In data Mon, 8 Oct 2007 00:16:07 +0200 (CEST), Antoninus Twink
scrisse:


this goes in seg fault here if s==0
garbage in, garbage out. If you don't feed it s string (a '\0' terminated
char[]), it'll crash.
That function is supposed to work on a string, and ia as RH told us
elsethread, called exactly once in his program and not supposed to be a
library routine.
idem than above
Same issue here, only that Antonius claimed his version to be bug free, so
he'd better have checked the s==NULL case...
/* 0 for error */
/* instring substitution */
char* dot_to_underscore(char *s)
{char *t=s;
l1:;
if(t==0) return 0;
if(*t=='.') *t='_';
if(*t) {++t; goto l1;}
return s;
}

this not
But it is doing something entirly different: it modifies the string instead
of creating a duplicate and modify only that.

However: I hope that RH free()s the malloc()ed memory later in his code, (as
he claims not doing so to be sloppy style in another thread.)

Bye, Jojo
 
R

Richard Heathfield

Joachim Schmitz said:
...
garbage in, garbage out.

This function is not even *called* unless s is non-null, so the situation
simply doesn't arise.

However: I hope that RH free()s the malloc()ed memory later in his code,

I do indeed. And no, I didn't retrofit the free() as a result of your
query. It was already there. But your query did at least cause me to
check...
(as he claims not doing so to be sloppy style in another thread.)

....because we're all capable of being sloppy occasionally, aren't we? So it
was a worthwhile question to raise. Had I been sloppy, this paragraph
would have been an apology and an announcement of a URL to the fix. But I
hadn't, so it isn't.
 
K

Kelsey Bjarnason

To know the secret codes is a powerful motive. But it is the attiude of a
hacker, not a software engineer.

Context: while(*p++ = *q++);

Terseness - to a point - is a good thing. Some languages require
constructs such as var1 = var1 + 1, where C allows var1++, some require
var1 = var1 + var2 where C allows var1 += var2.

It doesn't take long, particularly when working with descriptive
identifiers, to learn how the terseness benefits coding; it doesn't take
long after that to learn terseness as a habit.

Sure, there are "more clear" ways to write the loop above; but that "more
clear" is largely dependent upon the relative skill of the coder. An
experienced C slinger will see the code as a gestalt and know exactly what
it does, the newbie may not - but the newbie isn't expected to grok a lot
of things, they're still learning.

It's a case of which is the more desirable goal - making things which
newbies have to learn, or making things easier and more efficient for
skilled coders?
 
K

Kelsey Bjarnason

[snips]

in the little i know overlap is not a problem in a x86 cpu
because the C instruction of assignament is done trhu a register or
trhu push pop; so where is the problem?

Assuming that the code will execute only on x86 processors?
 
M

Malcolm McLean

Richard Heathfield said:
...because we're all capable of being sloppy occasionally, aren't we? So
it
was a worthwhile question to raise. Had I been sloppy, this paragraph
would have been an apology and an announcement of a URL to the fix. But I
hadn't, so it isn't.
It is rather lackadaisical, Friday afternoon type code. Not for the reason
the OP claimed, but because it hardcodes the characters / strings to be
replaced, when they should be passed in by parameters.
However the job is only to replace one character with another in a string,
very much bread and butter programming, whilst the real interest of the
program is elsewhere. So the fuss is absurd.
 
P

Peter Pichler

Peter said:
That alone would be enough to put most people out of the
running in my book.


This wouldn't help.

[Interview]er: "Here is a wooden box to keep apples in.
What security flaws can you find in it?"
[Interview]ee (after a careful examination):
"There aren't any."
er: "You failed the interview. You didn't consider the risk
of someone breaking someone's skull with it."
ee: "That's not a proper use of the box! The box is
perfectly safe if used properly."
er: "That wouldn't help."
 
B

Ben Bacarisse

Peter Pichler said:
Peter said:
That alone would be enough to put most people out of the
running in my book.


This wouldn't help.

[Interview]er: "Here is a wooden box to keep apples in.
What security flaws can you find in it?"
[Interview]ee (after a careful examination):
"There aren't any."
er: "You failed the interview. You didn't consider the risk
of someone breaking someone's skull with it."
ee: "That's not a proper use of the box! The box is
perfectly safe if used properly."
er: "That wouldn't help."

Analogies are slippery things. I could have gone this way:

er: "You failed the interview. You didn't consider the risk
of staking them too high and someone being hurt if they
fall over."
ee: "That's not a proper use of the box! The box is
perfectly safe if used properly. On page 206 of the manual
for these boxes it clearly says stack no more than 5 high."
er: "That wouldn't help."

I post this because I don't like analogies, not because it illustrates
my point of view! To come clean, I am probably somewhere in the
middle with a leaning towards the view that errors result from sloppy
programming (in general) as much as from misuse of specific "risky"
functions.
 
R

Richard Bos

Peter Pichler said:
Peter said:
That alone would be enough to put most people out of the
running in my book.


This wouldn't help.

[Interview]er: "Here is a wooden box to keep apples in.
What security flaws can you find in it?"
[Interview]ee (after a careful examination):
"There aren't any."
er: "You failed the interview. You didn't consider the risk
of someone breaking someone's skull with it."
ee: "That's not a proper use of the box! The box is
perfectly safe if used properly."
er: "That wouldn't help."

That's not a valid analogy. It is very unlikely that an apple crate will
be used for craniofractural purposes, but as we've seen often enough in
comp.lang.c, strncpy() _will_ be used the wrong way.

Richard
 
J

J. J. Farrell

In data Sat, 13 Oct 2007 00:59:07 +0200, Tor Rustad scrisse:
[in conversation with Richard Heathfield]
Hmm.. haven't written it before, my first try would be:
/* pre-condition: checking for overlapped objects */
N = strlen(s);
for (i=0; i<N; i++) {
for (j=0; j<N; j++)
assert(t + i != s + j);
}

in the little i know overlap is not a problem in a x86 cpu
because the C instruction of assignament is done trhu a register or
trhu push pop; so where is the problem?

Perhaps in the fact that the code is required to run on a Frooble m57
processor? Where did x86 come in to this?
 
T

Tor Rustad

Peter said:
[...]

[Interview]er: "Here is a wooden box to keep apples in.
What security flaws can you find in it?"
[Interview]ee (after a careful examination):
"There aren't any."
er: "You failed the interview. You didn't consider the risk
of someone breaking someone's skull with it."


What has assets of national importance, in common with *apples*???

I be very surprised, if UK or US security professionals these days,
will hire people with such a complete lack of understanding of basic
security principles.
 
T

Tor Rustad

[my final post here - promise!]

Richard said:
Tor Rustad said:
Richard said:
Tor Rustad said: [...]

You did not answer the question.
It was a loaded question, so I engaged the safety catch.
I won this on a bluff then, they didn't misuse strncpy. :)

My answer was: "If they've misused strncpy, I would consider them to be
people capable of misusing strncpy." This is a statement of the form "if A
is true, then B is true", which doesn't enable us to deduce anything about
the state of B if A is false, so I don't quite see how you claim to have
"won" anything. Still, if *you* think you've won, that's fine by me.

If there was nothing to be afraid of, why did you take such extreme
safety measures then? In mathematical terms, I viewed your behavior as
"reductio ad absurdum". :)

In science, making statements that cannot be falsified, has *no value*.
So, how can your "no usage pitfalls with strncpy", be falsified by
measurement exactly?

Did you refer to your own C skill-level only?

By not making a stand, you lost.
 
P

Peter Nilsson

Tor Rustad said:
Hmm.. haven't written it before, my first try would be:

/* pre-condition: checking for overlapped objects */
N = strlen(s);
for (i=0; i<N; i++) {
for (j=0; j<N; j++)
assert(t + i != s + j);

}

Slightly less Grossly InEfficient ;) ...

/* test: if n bytes starting at s
// overlaps n bytes starting at t
*/
int mem_overlap(const void *s, const void *t, size_t n)
{
const char *u, *v;
size_t m = n;
if (n == 0) return (s == t);
for (u = t; m--; u++) if (u == s) return 1;
for (v = s; n--; v++) if (v == t) return 1;
return 0;
}


/* test: if destination s
// overlaps source string t
*/
int str_overlap(const char *s, const char *t)
{
const char *u = t;
do { if (u == s) return 1; } while (*u++);
while (--u != t) if (++s == t) return 1;
return 0;
}
 
R

Richard Heathfield

Tor Rustad said:
Peter said:
Peter said:
Tor Rustad said:

Knowing about the zero-padding, would give you a plus,
but more important would be if the candidate said
something about the security related usage pitfalls.

There aren't any -
[...]

[Interview]er: "Here is a wooden box to keep apples in.
What security flaws can you find in it?"
[Interview]ee (after a careful examination):
"There aren't any."
er: "You failed the interview. You didn't consider the risk
of someone breaking someone's skull with it."


What has assets of national importance, in common with *apples*???

Okay, let's try something oh so very different.

Interviewer: Here is St Edward's Crown, in which Queen Elizabeth II was
crowned in 1953. It is well over 400 years old, it is one of the UK's
crown jewels, and not only is it unquestionably an asset of national
importance in its own right, but also, when the Queen is wearing it, it
can be considered as a container for another asset of national importance
- i.e. the monarch's head. What security flaws can you find in it?
Interviewee (after a careful examination): There aren't any.
Interviewer: You failed the interview. You didn't consider the risk of
someone breaking someone's skull with it.

Better? (FCOL)
I be very surprised, if UK or US security professionals these days,
will hire people with such a complete lack of understanding of basic
security principles.

You have not demonstrated such a lack in your correspondents. I'm not quite
sure what you /have/ demonstrated (other than, perhaps, your own lack of
understanding of analogies).

The strncpy function does a simple task reasonably well. Yes, we all know
it has a lousy name, but apart from that it's a simple function, easy to
use properly. Yes, it's easy to use improperly too, but then so are lots
of C functions. All of them, really. And of course C operators are pretty
easy to misuse, as well. Er, so what? The mark of an anathematical
function is not that it can be used improperly, but that it cannot be used
properly.
 
R

Richard Heathfield

Tor Rustad said:
[my final post here - promise!]

Richard said:
Tor Rustad said:
Richard Heathfield wrote:
Tor Rustad said:
[...]

You did not answer the question.
It was a loaded question, so I engaged the safety catch.
I won this on a bluff then, they didn't misuse strncpy. :)

My answer was: "If they've misused strncpy, I would consider them to be
people capable of misusing strncpy." This is a statement of the form "if
A is true, then B is true", which doesn't enable us to deduce anything
about the state of B if A is false, so I don't quite see how you claim
to have "won" anything. Still, if *you* think you've won, that's fine by
me.

If there was nothing to be afraid of,

Um, now you've lost me. Where does "afraid" come from?
why did you take such extreme safety measures then?

What extreme safety measures? I am just naturally cautious. You said "would
you consider the OpenSSL maintainers sloppy and clueless?" and I'm not in
the habit of accusing people of being sloppy and clueless without having
more to go on than an unsubstantiated claim in a Usenet article. That
would be stupid, right?
In mathematical terms, I viewed your behavior as
"reductio ad absurdum". :)

Did you.
In science, making statements that cannot be falsified, has *no value*.

Could you please falsify that statement? Otherwise it has no value. said:
So, how can your "no usage pitfalls with strncpy", be falsified by
measurement exactly?

If your claim is that strncpy can be misused, I agree. So can all C
functions. Big deal. So what?
Did you refer to your own C skill-level only?

By not making a stand, you lost.

Lost? I wasn't aware that it was a competition. I thought we were trying to
arrive at some kind of truth. By trying to turn it into a contest, you
have discarded my interest. If you want to fight, please find someone
else.
 
¬

¬a\\/b

In data 15 Oct 2007 11:54:09 -0700, J. J. Farrell scrisse:
In data Sat, 13 Oct 2007 00:59:07 +0200, Tor Rustad scrisse:
[in conversation with Richard Heathfield]
What about the non-trivial case, where buffers overlap?
I'm not entirely sure this can be done in standard C, at least not the
obvious way, although I'd be glad to be proved wrong.
Hmm.. haven't written it before, my first try would be:
/* pre-condition: checking for overlapped objects */
N = strlen(s);
for (i=0; i<N; i++) {
for (j=0; j<N; j++)
assert(t + i != s + j);
}

and if for some i t+i==0 ?
and if for some j s+j==0 ?

yes this goes in segfault for strlen() in the "s" case

why the string "t" has len == N? (like string "s" and if len of "t" is
many time less of N the above have zero sense because can detect a
overlap where there are no one)
Perhaps in the fact that the code is required to run on a Frooble m57
processor? Where did x86 come in to this?

it is just an example. for me the problem could be implementig
somethig like below in your favorite cpu language (where in "b" and in
"k" there are the 2 arrays) a=1 all is ok, a=0 there is something
wrong

a=b: a==0#.e; #.1;
..0: ++a; jz .e;
..1: B*a#.0
c=a; c-=b;
r=k; r==0#.e; r+=c; jc .e; c=a; a=k; #.2;
/* jc means string not can be 0 anywhere
..e: a=0; #.3;
..2: a==b#.e; ++a; a<r#.2
++b; a=k; b<c#.2
a=1;
..3:

don't know if it has some meaning ....
i'm just a pirate :)
 
P

Peter Nilsson

Keith Thompson said:
I think you're misunderstanding what Richard said.

It think Richard revels in creating these misunderstandings.
I don't think any other regular manages to generate so many
long threads because of a subtle difference in the
interpretation of one word.
In fact, I suspect that everyone in this discussion is in
fundamental agreement about most things;

I'd say it's the fundamentals that we are in complete
disagreement over!
some of the concepts just aren't being communicated
clearly.

Both camps are perfectly clear. They're just incompatible.

As for strncpy() the function itself is not fundamentally
unsafe. Its semantics are precisely defined by the standard,
including the specification of cases where its behavior is
undefined. In contrast to gets(), it *can* be used safely
in well-written code.

Richard's point is there is no risk with gets, if you never
use it. That is perfectly true, but it is only barely
constructive.
It's true that it *can* be misused, but so can any
function.

Yes, but how can they be misused and why? Their very
existance creates risks, but additional risks are
involved to increasing degrees. There's very little
that can go wrong with exit(), there's quite a few
things that can go wrong with printf(), there's
almost nothing right about gets().
It's also true that, because of its misleading name, it's
particularly vulnerable

You mean it carries is a risk? Or do you see risk as a
stake, not a vulnerability?
to misuse by programmers who haven't
bothered to read and understand its specification (the
standard, a man page, whatever).

Things can go wrong for reasons other than 'not bothering'.
Risk analysis is a no-fault concept.

A just mopped floor can be slippery. An dirty unmopped floor
can be slippery too. The fundamental issue is that having a
floor for customers carries a risk. That risk needs to be
managed.

Richard would say there's no risk if you never let customers
walk on a dirty or just mopped floor. Perfectly logical. But
it offers no explanation of why you would do that, how you
achieve it, let alone what you need to do to cover the case
where you fail to achieve that.

Despite best intentions, customers will slip on floors. It's
because of attitudes like Richard's, that he wouldn't let
that happen, or that you shouldn't hire a stoor manager who
could let that happen, that many governments make public
liability insurance compulsory.
Because of that, any use of strncpy() should IMHO be viewed
with suspicion (like any use of a cast operator)

No Keith, there's no risk with cast operators either, if you
use them properly. Indeed, there is no risk with any aspect
of C, if you use it properly.

This is fatuousl^W trivially true, but it is not constructive.
And it certainly isn't the desired response from an interviewer
asking if the candidate can identify risks in a given function.
*unless* you're certain that the author actually knows what
he's doing.

What does certainty buy you? Even dmr has an errata page for
K&R2. If bwk can't be trusted to get it right first time, and
if Richard isn't available, who else could you possibly hire? ;)
I strongly suspect that most uses of strncpy() in
production code are incorrect.

Yes, but why? If all you can think of is that programmers
can't be bothered, then you really do need to change your
mindset. [Whether you do so is, of course, in your hands.]
But that's not the fault of strncpy() itself, which is a
perfectly innocent little function

Again with the blame. It's not about knowing who's to blame.
It's about knowing what can go wrong, where possible why, and
what you do about it. Sometimes things go wrong for reasons
outside of your control. [Sometimes things go wrong because
you have too much control!]

Contenting yourself that to know if something goes wrong, it
will be clear who is to blame is not a constructive analysis
of the risks.
that does exactly what it's supposed to do.

Rain does exactly what it's supposed to do. It can still
disrupt Wimbledon. When that happens, it's not because
the tournament director couldn't be bothered preventing
the rain.
It's the fault of the incorrect assumptions made by too
many programmers.

Yes, yes, but how do you manage that? Don't hire people
with incorrect assumptions is one possibility, but unless
you make the interview 20 years long, you're not going
to be able to make the right call in all cases.
And those assumptions are partly the fault of the way
strncpy() doesn't fit well into the context in which
it's specified.

Richard does not equate weakness with risk. In other words,
whatever weakness you might spot in strncpy, it is incidental
because it is not the function weakness that introduces the
risk.

These long threaded misunderstandings, where apparently
everyone actually agrees, would be fewer in number if
Richard were just a little more flexible with regards
to recognising that people don't always use terminology
in precisely the same way that he does. Moreover, the
fact that others use terminology differently is not
because they are clueless.
 
R

Richard Heathfield

Peter Nilsson said:
It think Richard revels in creating these misunderstandings.

Well, no, Richard doesn't. What Richard *does* do is (try to) write his
replies to what was written, which isn't necessarily what was intended.
Richard is not a mind-reader, and so he usually tries to avoid guessing at
some hidden meaning concealed by the words of an article, and instead
takes them at face value. This means that he sometimes misses irony and
sarcasm (despite employing them freely, which makes him rather
hypocritical, but then - as he would say - we're all hypocritical, but
some of us are more honest about it!). He will, however, normally make
more effort to "interpret" the text if it's written by a newcomer who is
perhaps struggling to phrase his question in a way that he wouldn't if he
only knew exactly what the problem was. But when Richard replies to
articles written by people with more experience of programming in general
and C in particular, he is less prone to agonise over the text, trying to
discern the meaning behind the words.

Richard's attitude towards strncpy is this:

(a) it has a lousy name;
(b) there is rarely any point in using it;
(c) it is likely that most uses of it are flawed;
(d) there are occasions on which it is The Right Thing;
(e) because of (c), it is tempting to advise people to eschew it
completely, but because of (d), he thinks that a better solution would be
to educate people about its proper use and to correct their misconception
that it is a "safe" strcpy.

Richard's point is there is no risk with gets, if you never
use it. That is perfectly true, but it is only barely
constructive.

Actually, Richard's point about gets is that you should never use it,
because it *cannot* be used safely.

A just mopped floor can be slippery. An dirty unmopped floor
can be slippery too. The fundamental issue is that having a
floor for customers carries a risk. That risk needs to be
managed.

Richard would say there's no risk if you never let customers
walk on a dirty or just mopped floor.

Right. So it is necessary to mop the floor *properly* - which, in this
case, means in a timely manner, so that it has time to dry before the shop
opens. Likewise, it is necessary to use strncpy properly (if you use it at
all) - which means it is necessary to understand what it does.
Despite best intentions, customers will slip on floors.

Well, there's no helping some people. :)
It's because of attitudes like Richard's, that he wouldn't let
that happen, or that you shouldn't hire a stoor manager who
could let that happen, that many governments make public
liability insurance compulsory.

Richard disagrees. Richard's attitude is that the floor should be given
enough time to dry before customers are allowed to walk on it. If
Richard's attitude prevails and the floor *is* given time to dry,
customers won't slip on a wet floor (because there won't *be* a wet
floor). It is because of attitudes *not* like Richard's (e.g. "oops, we're
late with the floor-cleaning but what the hey, who cares, right?") that
people slip on wet floors.
No Keith, there's no risk with cast operators either, if you
use them properly. Indeed, there is no risk with any aspect
of C, if you use it properly.

And since most people don't use C properly, we should treat all C code with
suspicion. And Richard does! :)

Rain does exactly what it's supposed to do. It can still
disrupt Wimbledon. When that happens, it's not because
the tournament director couldn't be bothered preventing
the rain.

Lousy analogy - buffer overruns (say) are not an arbitrary phenomenon that
might happen to your code. They are controllable. The programmer can't
stop buffer overrun attempts any more than the tournament director can
stop the rain but, unlike the TD, the programmer is in a position to
prevent the bad effects of such attempts.
Yes, yes, but how do you manage that?

FIRE THEM ALL! Bwahahahahahahahahaha!

Er, sorry - Richard means, hire the right people, people who are bright,
skilful, and flexible. Have at least one Pedantic S.O.B. on the team and
preferably two. (That could actually be their job title, right?) Have at
least one of them in on every code review, and review *all* code. And
value correctness over feature counts.

These long threaded misunderstandings, where apparently
everyone actually agrees, would be fewer in number if
Richard were just a little more flexible with regards
to recognising that people don't always use terminology
in precisely the same way that he does. Moreover, the
fact that others use terminology differently is not
because they are clueless.

Richard thinks these long threaded misunderstandings would be a lot shorter
if everyone just accepted that Richard Is Always Right (for a certain
value of "Richard", obviously[1]). Can we, at least, all agree on *that*?


[1] And, alas, for a certain value of "Always".
 
K

Keith Thompson

Richard Heathfield said:
Er, sorry - Richard means, hire the right people, people who are bright,
skilful, and flexible. Have at least one Pedantic S.O.B. on the team and
preferably two. (That could actually be their job title, right?) Have at
least one of them in on every code review, and review *all* code. And
value correctness over feature counts.
[...]

Are you hiring Pedantic S.O.B.s?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Fibonacci 0
Adding adressing of IPv6 to program 1
C language. work with text 3
code review 26
Can't solve problems! please Help 0
compressing charatcers 35
Strange bug 65
K&R exercise 5-5 10

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,252
Latest member
MeredithPl

Latest Threads

Top