Bug/Gross InEfficiency in HeathField's fgetline program

Richard Heathfield · Oct 17, 2007

Keith Thompson said:

Richard Heathfield said:
Richard Heathfield said:

Er, sorry - Richard means, hire the right people, people who are bright,
skilful, and flexible. Have at least one Pedantic S.O.B. on the team and
preferably two. (That could actually be their job title, right?) Have at
least one of them in on every code review, and review *all* code. And
value correctness over feature counts.

Click to expand...

[...]

Are you hiring Pedantic S.O.B.s?

I already have more than enough PSOBs, thanks!

James Kuyper Jr. · Oct 17, 2007

¬a\/b said:
In data 15 Oct 2007 11:54:09 -0700, J. J. Farrell scrisse:

In data Sat, 13 Oct 2007 00:59:07 +0200, Tor Rustad scrisse:
[in conversation with Richard Heathfield]

Click to expand...

Click to expand...

[Re: implementing an equivalent of strncpy(t,s,n)]
....
This function doesn't require that the source be a null terminated
string. It could legally be a pointer to a character array of at least n
characters, that contains no null characters. If so, the strlen() call
will not terminate at the end of the array, so this code would have
undefined behavior. What is really needed can be written as follows:

for(N=0; N<n && s[N]; N++);

Alternatively,

void *end = memchar(s, '\0', n);
N = (end ? (char*)end - s : n);

which might be faster, depending about how poorly my loop is optimized,
and how well memchar() is optimized.

and if for some i t+i==0 ?
and if for some j s+j==0 ?

What is the problem that you see in that case? Null pointer values can
be compared for equality other pointer values, with well-defined
behavior. The problem that you could worry about is that t+i or s+j
might go more than one past the end of the array that they point into.

However, the implementor might as well assume that both t and s point
into sufficiently large arrays, since as far as I know there's no way to
test against the possibility that they don't.

yes this goes in segfault for strlen() in the "s" case

There's no guarantee in the C standard that it goes into segfault.
That's platform-dependent.

why the string "t" has len == N?

It doesn't matter to this function whether or not t points at a string,
much less what the length of that string is. All that matters is that t
must point at a position where you can write at least n characters with
defined behavior.

Orlando B. Salazar · Oct 17, 2007

Mark McIntyre scribbled:

Personally I consider it the height of arrogance to believe
unequivocally that you cannot possibly be wrong.

Are you thoroughly confident that someone that believes unequivocally
that they cannot possibly be wrong has reached the height of arrogance?

Regards,
Orlando B. Salazar

¬a\\/b · Oct 17, 2007

In data Mon, 15 Oct 2007 18:23:54 -0700, Peter Nilsson scrisse:

Slightly less Grossly InEfficient ...

/* test: if n bytes starting at s
// overlaps n bytes starting at t
*/
int mem_overlap(const void *s, const void *t, size_t n)
{
const char *u, *v;
size_t m = n;
if (n == 0) return (s == t);
for (u = t; m--; u++) if (u == s) return 1;
for (v = s; n--; v++) if (v == t) return 1;
return 0;
}

/* test: if n bytes starting at s
// overlaps n bytes starting at t
*/
/* assume a pointer has the same size of an int and one unsigned */
/* return 1 if error or overlap 0 otherwise */
int mem_overlap123(char* s, int n, char* t, int m)
{if(s==0|| t==0 || n<0 || m<0) return 0;

/* no array can have the address 0
if( ((int)s)>=0 &&((int)(s+n-1))<= 0) return 1;
if( ((int)t)>=0 &&((int)(t+n-1))<= 0) return 1;

if( ((int)s)<=0 &&((int)(s+n-1))>= 0) return 1;
if( ((int)t)<=0 &&((int)(t+n-1))>= 0) return 1;

/* s----- t------ || t----- s------*/
if( (unsigned)(s+n-1) < (unsigned) t) return 0;
if( (unsigned)(t+m-1) < (unsigned) s) return 0;
return 1;
}

not tested
how many errors do you see?

Malcolm McLean · Oct 17, 2007

Tor Rustad said:
In science, making statements that cannot be falsified, has *no value*.
So, how can your "no usage pitfalls with strncpy", be falsified by
measurement exactly?

A hypothesis which is empirical in nature is stronger if it survives an
attempt at falsification. That's not the same thing as saying that every
statement in science must be falsifiable, or it has no value. A theorem, as
opposed to theory, cannot be falsified by experiment, for example, but
theorems are very useful to scientists.

Malcolm McLean · Oct 17, 2007

Ben Bacarisse said:
[Interview]er: "Here is a wooden box to keep apples in.
What security flaws can you find in it?"
[Interview]ee (after a careful examination):
"There aren't any."
er: "You failed the interview. You didn't consider the risk
of someone breaking someone's skull with it."
ee: "That's not a proper use of the box! The box is
perfectly safe if used properly."
er: "That wouldn't help."

Click to expand...

Analogies are slippery things. I could have gone this way:

er: "You failed the interview. You didn't consider the risk
of staking them too high and someone being hurt if they
fall over."
ee: "That's not a proper use of the box! The box is
perfectly safe if used properly. On page 206 of the manual
for these boxes it clearly says stack no more than 5 high."
er: "That wouldn't help."

I post this because I don't like analogies, not because it illustrates
my point of view! To come clean, I am probably somewhere in the
middle with a leaning towards the view that errors result from sloppy
programming (in general) as much as from misuse of specific "risky"
functions.

You've got to have a degree of knowledge about the fruit industry before you
can say whether the box is dangerous.
For instance if the boxes look exactly like orgage boxes, which are
typically stacked ten high, then the fact that they will topple over if
stacked more than five is an obvious danger.
However if established practise in fruit warehouses is to stack boxes no
more than three high, and there is nothing about the apple boxes to suggest
that they are unusual, then it is not a safety risk. In fact we create a
safety risk by sticking a "stack no more than five high" label on it,
because it gets people out of the habit of observing notices.

jameskuyper · Oct 17, 2007

¬a/b wrote:
....

/* test: if n bytes starting at s
// overlaps n bytes starting at t
*/
/* assume a pointer has the same size of an int and one unsigned */
/* return 1 if error or overlap 0 otherwise */
int mem_overlap123(char* s, int n, char* t, int m)
{if(s==0|| t==0 || n<0 || m<0) return 0;

/* no array can have the address 0
if( ((int)s)>=0 &&((int)(s+n-1))<= 0) return 1;
if( ((int)t)>=0 &&((int)(t+n-1))<= 0) return 1;

if( ((int)s)<=0 &&((int)(s+n-1))>= 0) return 1;
if( ((int)t)<=0 &&((int)(t+n-1))>= 0) return 1;

/* s----- t------ || t----- s------*/
if( (unsigned)(s+n-1) < (unsigned) t) return 0;
if( (unsigned)(t+m-1) < (unsigned) s) return 0;
return 1;
}

not tested
how many errors do you see?

You use 'n' in every location where I would have expected you to use
'm'. Why have a fourth argument if you're going to ignore it?

I'd recommend const-qualifying the pointer arguments, using size_t for
the integer arguments, and I find your coding style hard to read and
hard to debug, but those aren't errors.

This code is based upon two non-portable assumptions that you document
in the comments, but it also depends upon some additional unportable
assumptions that you don't document. You assume that because an
integer type has the same size as a pointer, a pointer will converted
to a unique value of that type; while it is certainly possible to
implement pointer->int conversions that way, and indeed likely,
tthere's no obligation for an implementation to do so. This is what
intptr_t and uintptr_t are for - use them, and do compile-time
checking with #error to verify that they exist.

You assume that it is possible to convert a pointer into an integer,
and then derive meaningful information about that pointer by examining
the integer's value. In particular, you seem to be assuming that a
null pointer will convert to an integer with a value of 0, and that
the order of two such integers tells you something useful about the
order of the positions in memory pointed at by the corresponding
pointers.

There are probably implementations where all of the non-portable
assumptions you make are true, but there are also ones where they are
not.

Richard Heathfield · Oct 17, 2007

Malcolm McLean said:

A hypothesis which is empirical in nature is stronger if it survives an
attempt at falsification. That's not the same thing as saying that every
statement in science must be falsifiable, or it has no value. A theorem,
as opposed to theory, cannot be falsified by experiment, for example, but
theorems are very useful to scientists.

Theorems certainly /can/ be falsified (at which point they instantly stop
being theorems, because theorems are supposed to be true statements). For
example, Pythagoras's Theorem (which is a statement about Euclidean
geometry) is a theorem that can easily be falsified (if it is not true).
All you have to do is find a right triangle lying flat on the Euclidean
plane, such that the square of its hypotenuse is not equal to the sum of
the squares of the other two sides.

pete · Oct 17, 2007

Richard Heathfield wrote:

Richard's attitude towards strncpy is this:

(a) it has a lousy name;
(b) there is rarely any point in using it;
(c) it is likely that most uses of it are flawed;
(d) there are occasions on which it is The Right Thing;
(e) because of (c), it is tempting to advise people to eschew it
completely, but because of (d),
he thinks that a better solution would be
to educate people about its proper use
and to correct their misconception that it is a "safe" strcpy.

I have never written a program that used strncpy,
except as an academic exercise.

How often do you use strncpy, Dr Heath Field?

Tor Rustad · Oct 18, 2007

Richard said:
Tor Rustad said:
[...]

What has assets of national importance, in common with *apples*???

Click to expand...

Okay, let's try something oh so very different.

Why not try something computer related?

You have not demonstrated such a lack in your correspondents.

Amazing, I was *not* hiring someone to protect *apples*, *crown* or
looking for a clueless in security, unable to identify *common* errors.

For an introduction to basic security principles, see e.g. [1]:

"Principle 32. Identify and prevent common errors and vulnerabilities

Discussion: Many errors reoccur with disturbing regularity - errors such
as buffer overflows, race conditions, format string errors, failing to
check input for validity, and programs being given excessive privileges.
Learning from the past will improve future results."

The strncpy function does a simple task reasonably well. Yes, we all know
it has a lousy name, but apart from that it's a simple function, easy to
use properly. Yes, it's easy to use improperly too, but then so are lots
of C functions.

The *relevant point*, is that this C function has been misused a lot,
and a buffer overflow can result in a total compromise of a computer
system. The probability of misuse, isn't low either.

[1] NIST Special Publication 800-27, "Engineering Principles for
Information Technology Security".

Tor Rustad · Oct 18, 2007

Malcolm said:
A hypothesis which is empirical in nature is stronger if it survives an
attempt at falsification. That's not the same thing as saying that every
statement in science must be falsifiable, or it has no value. A theorem,
as opposed to theory, cannot be falsified by experiment, for example,
but theorems are very useful to scientists.

I didn't say, the only way to falsify statements, was by experiment. In
Mathematics, we usually falsify the incorrect conclusions or theorems,
by logic. However, I do admit that definitions and axioms are useful,
but can't be falsified.

We have lots of data now on strncpy usage, so why Richard didn't clarify
the conditions I could falsify his "no usage pitfalls with strncpy",
shows he was just word twisting.

Peter Pichler · Oct 18, 2007

Richard said:
Tor Rustad said:

You have not demonstrated such a lack in your correspondents. I'm not quite
sure what you /have/ demonstrated (other than, perhaps, your own lack of
understanding of analogies).

Richard, Tor is by his own words in charge of recruitment. Wherever
I looked, such people are called "managers". Managers do not have to
demonstrate anything. They are omniscient by definition. You have lost
before you even knew there was a contest.

Tor Rustad · Oct 18, 2007

Peter said:
Richard, Tor is by his own words in charge of recruitment. Wherever
I looked, such people are called "managers".

FYI, more than management can disqualify people from being hired,
especially during the technical session of the interview.

Richard Heathfield · Oct 18, 2007

pete said:

I have never written a program that used strncpy,
except as an academic exercise.

How often do you use strncpy, Dr Heath Field?

Well, I've just checked my current code base here, and can't find any
instances at all. That doesn't mean I won't find instances in older code,
and it doesn't mean I've never used it "on site", so to speak, but it does
seem that I rarely if ever use it. It is entirely possible that, like you,
I have never called it in production code. I don't recall ever doing so,
anyway. But I do know how it works, and I know how to use it if ever the
need should arise.

Richard Heathfield · Oct 18, 2007

[ I already wrote a long reply to this, but the newsreader crashed on send
(no, I did *not* write the newsreader, and no, I *will* not write a
newsreader). I don't have time to reconstruct the whole reply. This is a
rather polite summary of my previous, very short-tempered reply. ]

Tor Rustad said:

Amazing, I was *not* hiring someone to protect *apples*, *crown* or
looking for a clueless in security, unable to identify *common* errors.

(a) you were *not* hiring *anyone*. You were taking part in a Usenet
discussion.
(b) if you think I'm clueless, why bother to continue this discussion?
(c) what if I think /you/ are clueless, unable to recognise *common* sense?

For an introduction to basic security principles, see e.g. [1]:

"Principle 32. Identify and prevent common errors and vulnerabilities

Discussion: Many errors reoccur with disturbing regularity - errors such
as buffer overflows, race conditions, format string errors, failing to
check input for validity, and programs being given excessive privileges.
Learning from the past will improve future results."

In my experience, the following bug is far more common than strncpy:

char *t;

strcpy(t, s);

The bug here is in failing to allocate *any storage at all* for t. I have
seen this happen in production code far more than I have seen strncpy
misused (or indeed used at all). In fact, I have seen what we might call
"the char * bug" (if only there weren't so many other bugs that could
easily have the same name) so often that it definitely counts as
disturbing regularity.

If you ban strncpy, logically you have to ban char * too, because it's a
far greater risk. To do otherwise is to be guilty of rearranging the
deckchairs on the Titanic.

The *relevant point*, is that this C function has been misused a lot,
and a buffer overflow can result in a total compromise of a computer
system. The probability of misuse, isn't low either.

So ban pointers. They cause far more trouble than strncpy.

FCOL, Tor. Wake up and smell the real risk - clueless programmers, hired by
witless buffoons because they have good hair and a good CV.

¬a\\/b · Oct 18, 2007

In data Wed, 17 Oct 2007 17:00:13 +0200, ¬a\/b scrisse:

/* test: if n bytes starting at s
// overlaps n bytes starting at t
*/
/* assume a pointer has the same size of an int and one unsigned */
/* return 1 if error or overlap 0 otherwise */
int mem_overlap123(char* s, int n, char* t, int m)
{if(s==0|| t==0 || n<0 || m<0) return 0;

^^^^^^^^^^^^^

return 1

/* no array can have the address 0
if( ((int)s)>=0 &&((int)(s+n-1))<= 0) return 1;

if( ((int)t)>=0 &&((int)(t+n-1))<= 0) return 1;

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

if( ((int)t)>=0 &&((int)(t+m-1))<= 0) return 1;

if( ((int)s)<=0 &&((int)(s+n-1))>= 0) return 1;
if( ((int)t)<=0 &&((int)(t+n-1))>= 0) return 1;

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

¬a\\/b · Oct 18, 2007

In data 17 Oct 2007 13:56:19 -0700, (e-mail address removed) scrisse:

¬a/b wrote:
...

/* test: if n bytes starting at s
// overlaps m bytes starting at t
*/

You use 'n' in every location where I would have expected you to use
'm'. Why have a fourth argument if you're going to ignore it?

I'd recommend const-qualifying the pointer arguments, using size_t for
the integer arguments, and I find your coding style hard to read and
hard to debug, but those aren't errors.

i do not use const because it make difficult the language

This code is based upon two non-portable assumptions that you document
in the comments, but it also depends upon some additional unportable
assumptions that you don't document. You assume that because an
integer type has the same size as a pointer, a pointer will converted
to a unique value of that type; while it is certainly possible to
implement pointer->int conversions that way, and indeed likely,
tthere's no obligation for an implementation to do so. This is what
intptr_t and uintptr_t are for - use them, and do compile-time
checking with #error to verify that they exist.

yes. why the standard not say that sizeof(unsigned int) == sizeof(int)
== sizeof(char*)?
for me one memory object can not be > INT_MAX chars

You assume that it is possible to convert a pointer into an integer,
and then derive meaningful information about that pointer by examining
the integer's value. In particular, you seem to be assuming that a
null pointer will convert to an integer with a value of 0, and that
the order of two such integers tells you something useful about the
order of the positions in memory pointed at by the corresponding
pointers.

yes i assume they are all numbers in the same integer type seen like a
circumference

James Kuyper Jr. · Oct 18, 2007

¬a\/b said:
In data 17 Oct 2007 13:56:19 -0700, (e-mail address removed) scrisse: ....
yes. why the standard not say that sizeof(unsigned int) == sizeof(int)
== sizeof(char*)?

Section 6.2.5p6 requires sizeof(unsigned int) == sizeof(int). It's only
sizeof(char*) which is not required to match.

> for me one memory object can not be > INT_MAX chars

It's arguable (and has frequently been argued) whether the standard's
description of sizeof() implicitly guarantees that you cannot statically
or automatically allocate an object whose size is greater than SIZE_MAX.
It's far less clear whether you can apply the same argument to objects
dynamically allocated using calloc(). In any event, the standard says
nothing that limits the size to INT_MAX. On every implementation I've
ever used, SIZE_MAX (or the pre-C99 equivalent, ((size_t)-1) ) has been
at least twice as big as INT_MAX. That's not a C standard requirement,
just an observation.

One of the key goals of the C standard is to define the language
sufficiently loosely that it can be efficiently implemented on just
about any hardware. As a result, C has been implemented (often with
fairly good efficiency) on just about everything.

Imposing the requirements you suggest would make efficient
implementation of C on many platforms more difficult. I've used machines
where 16 bits was the most reasonable size for 'int', which had a LOT
more than 65536 bytes of memory installed. I'm certain that there will
be machines in the future (I wouldn't be surprised if they already
exist) where the natural size for an integer is 32 bits, which have far
more than 4GB of memory installed.

Other languages have different goals. In Java, efficiency of
implementation (and even implementability itself), is sacrificed, if
need be, to make it easier to write portable code. There's a place for
both types of languages, but C is, by design, a language where programs
are efficiently portable, rather than easily portable.

....

yes i assume they are all numbers in the same integer type seen like a
circumference

Obviously they are numbers, and they all have the same integer type,
since you cast them to that type. It's the other assumptions you've made
that are unportable. I have no idea what the word "circumference" is
doing in that sentence, and therefore have no idea what that part of the
sentence means. I can assure you that each of the assumptions I referred
to is seriously non-portable.

James Kuyper Jr. · Oct 18, 2007

¬a\/b said:
In data Wed, 17 Oct 2007 17:00:13 +0200, ¬a\/b scrisse:

// overlaps m bytes starting at t

This is why I prefer to put each parameter on a separate line, and to
add a short comment describing that parameter on the same line. That
makes it easier to keep the comments consistent with the parameters.

Tor Rustad · Oct 19, 2007

Richard said:
Tor Rustad said:

(b) if you think I'm clueless, why bother to continue this discussion?

So, you wouldn't have replied then, given the same conditions.

if ( tor_eval(richard) == cluless )
goto no_reply;

(c) what if I think /you/ are clueless, unable to recognise *common* sense?

This didn't make any sense, you did reply, didn't you?

Either "I think /you/ are clueless" is *false*, or you are posting
nonsense. You cannot have it both ways:

if ( richard_eval(tor) == cluless )
goto post_reply;

For an introduction to basic security principles, see e.g. [1]:

"Principle 32. Identify and prevent common errors and vulnerabilities

Discussion: Many errors reoccur with disturbing regularity - errors such
as buffer overflows, race conditions, format string errors, failing to
check input for validity, and programs being given excessive privileges.
Learning from the past will improve future results."

Click to expand...

In my experience, the following bug is far more common than strncpy:

char *t;

strcpy(t, s);

The bug here is in failing to allocate *any storage at all* for t.

Such an error, is detectable by statically checking tools, if compiler
doesn't catch it, lint tools like e.g. splint does.

$ cat -n main.c
1 #include <stdio.h>
2 #include <string.h>
3
4 int main(void)
5 {
6
7 const char *s = "Hello";
8 char *t;
9 char d[5];
10
11 /* BUG 1 - storage not allocated */
12 (void)strcpy(t, s);
13 (void)printf("'%s'\n", t);
14
15 /* BUG 2 - destination buffer not null terminated */
16 strncpy(d, s, sizeof d);
17 (void)printf("'%.*s'\n", (int)sizeof d, d);
18
19 /* BUG 3 - destination buffer truncated */
20 strncpy(d, s, sizeof d - 1);
21 (void)printf("'%.*s'\n", (int)sizeof d, d);
22
23 return 0;
24 }
$ gcc -ansi -pedantic -W -Wall main.c
$ splint main.c
Splint 3.1.1 --- 20 Jun 2006

main.c: (in function main)
main.c:12:15: Unallocated storage t passed as out parameter to strcpy: t
An rvalue is used that may not be initialized to a value on some
execution
path. (Use -usedef to inhibit warning)

Finished checking --- 1 code warning
$

Note that the bugs of class 2 and 3, gives no warning above.

So ban pointers. They cause far more trouble than strncpy.

To be honest, in safety-critical or security-critical software, I can't
ever remember being hit by buffer overflow or unallocated storage, in
production. Before someone is allowed to do the real thing, they need to
master the basics.

However, I have looked at alternatives, for example cyclone:

http://www.cs.umd.edu/~mwh/papers/cyclone-cuj.pdf

FCOL, Tor. Wake up and smell the real risk - clueless programmers, hired by
witless buffoons because they have good hair and a good CV.

The main risk where I work, is rather the "clever" insiders.

Fibonacci	0	May 13, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
C language. work with text	3	Dec 10, 2021
code review	26	Feb 6, 2004
Can't solve problems! please Help	0	Sep 26, 2022
compressing charatcers	35	Apr 2, 2014
Strange bug	65	Nov 19, 2010
K&R exercise 5-5	10	Feb 20, 2007

Bug/Gross InEfficiency in HeathField's fgetline program

Richard Heathfield

James Kuyper Jr.

Orlando B. Salazar

¬a\\/b

Malcolm McLean

Malcolm McLean

jameskuyper

Richard Heathfield

pete

Tor Rustad

Tor Rustad

Peter Pichler

Tor Rustad

Richard Heathfield

Richard Heathfield

¬a\\/b

¬a\\/b

James Kuyper Jr.

James Kuyper Jr.

Tor Rustad

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads