Bug/Gross InEfficiency in HeathField's fgetline program

Richard Heathfield · Oct 20, 2007

Charlie Gordon said:

Why do you include <stddef.h> instead of <string.h> ?

I needed a definition for size_t. This is defined in <stddef.h> (as well as
in other places), which is why I included it. I didn't see any particular
need to include <string.h>, since it contained nothing I remembered
needing. On reflection, it would have been useful to pick up the
<string.h> prototype of strncpy, just on the off-chance that I
misremembered the n type. (And yes, had I done so, I could have omitted
the <stddef.h> header completely.)

Richard Heathfield · Oct 20, 2007

Charlie Gordon said:

"Richard Heathfield" <[email protected]> a écrit dans le message de
news: (e-mail address removed)...

Such bugs hardly survive rudimentary testing.

Oh, but they do. When I say "I have seen this happen in production code far
more than I have seen strncpy misused", I mean *real* production code,
code that has made it all the way through testing and into production.

90% of the IT work force. More than 90% of all code produced.

Yes. The *real* problem with software is that we as a society value code
quantity over quality.

Richard Heathfield · Oct 20, 2007

Charlie Gordon said:

I would be very interested to see your coding standard.

No, you wouldn't.

Seriously, I don't have one, in the formal document
sense. Never saw the point.

On site one can't, diplomatically speaking, get away with "your coding
standard is rubbish, use this instead" on one's first day, whereas one
very often /can/ get away with "your coding standard makes for an
interesting read, but I've identified the following problems with it, and
I've added explanations of why they are problems, along with suggested
solutions, what do you think?". So the modifications that one suggests
will differ from site to site, making a single master document less useful
than it might otherwise be.

And for my own code, my coding standard is of course in my head. Is it
worth writing down? Maybe, maybe not.

Richard Heathfield · Oct 20, 2007

James Kuyper Jr. said:

No, there's a key difference between the sarcastic use of "clever" and
ordinary misuse of "dumb" as an insensitive synonym for stupid.

I'm looking at his usage, and he's saying that "clever" people believe they
are better than their own constraints, by which I take him to mean that
such people think that they can get away with writing code in a way that
they would criticise if other people were to write it in that way.

If I have interpreted him correctly, the behaviour he describes is properly
labelled as "stupid" or "dumb".

<snip>

Tor Rustad · Oct 20, 2007

Charlie said:
"Tor Rustad" <[email protected]> a écrit dans le message de
[...]

Why do (void) strcpy and not strncpy ?

No reason really. The strncpy() code, was added to an existing source.

That is an ugly convention anyway, and splint would be silly to complain
about the return value of strcpy or strncpy not being used.

The cast was put there, before I knew what lint level I wanted. When the
default level was sufficient, I dropped the cast, but copied the
printf() line already there.

Tor Rustad · Oct 20, 2007

Richard said:
> Tor Rustad said:
[...]

>> There isn't a single *best solution* in security engineering,

Click to expand...

>
> Agreed. Nevertheless, the best solution is to hire bright people.

The candidate will be evaluated on a number of different things, the
brightest, may not be the best candidate for a position. I rather say it
this way: being bright above a certain level, is a precondition, and I
have no interest training a clueless.

> Bright people should be able to work out how not to abuse strncpy,
> right?

I wouldn’t bet my life on it.

Why not use both bright people and Occam's razor?

>
> That's "clever" as in "dumb", right? Just checking.

Not really, their IQ can still be very high. Some of the worst C code I
have audited, has been written by "clever" people. It appears to me, the
smarter they are, the more of a mess they can create, before being detected.

During audit, I have seen "prototype", "student code" or "good weather
code". Those programmers, has not complied with good software
engineering principles, perhaps because they never learned it or are too
lazy. The code audit was forced upon them, likely for the first time of
their lives.

> I fail to see how this says anything about strncpy, though.

I suggest you re-read my initial post in this sub-thread then, which you
replied to by posting a strncpy() implementation.

Malcolm McLean · Oct 21, 2007

Flash Gordon said:
Malcolm McLean wrote, On 20/10/07 06:56:

Your suggestion would mean that a lot of embedded code would have to use
short almost exclusively instead of int. Are you going to pay for all the
rewriting?

It only applies to embedded machines where the address space is greater than
the register size. I can think of an obvious example - the typical 8 bit
scene. However here the standard has spoken, and in my favour. int must be
16 bits on such machines.

However granted that there are some such machines - int needs replacing with
short one of these two hold. Either the extra instructions to calculate a 32
bit result produce unacceptable performance, or the space taken is
unacceptable. There will be a few cases.

However if I am to pay for the rewriting I also want the economic benefits
of decreased costs and easier integration and reuse.

My namesake, Malcom (sic) McLean introduced containerised shipping. You
would have been the first to say "but Mr McLean, not all goods fit easily
into containers. Are you going to pay for all that hold space wasted as
ships sail around with half-filled containers?". It is an inefficiency, but
actually he revolutionised the cargo transport industry, simply by
increasing the ease of handling. Every container fits every crane, every
lorry and every railway truck, because there is only one size.

Perhaps the designer is saving $25 on a $100 product. High speed memory
(e.g. cache) is expensive, so however fast operations are on 64 bit
integers you can massively increase your costs, or slow things down
massively, by doubling the size of your basic integer type.

Assuming that you have a pattern of uniform cache usage. If you use 20% of
the cache 80% of the time, you will slow things down by about 10% by halving
effective cache size. Its a cost, but not the massive hit you are
suggesting.

You seem to have forgotten that all this and more has already been pointed
out to you. If you think the decision is wrong start by taking it up with
Intel, AMD, the Posix standard group and MS. Most people will have some
respect for the abilities of at least one of these groups, although which
group will depend on the person.

It is no part of my case that the people who disagree with me are stupid.

BTW, I happen to know that there are still a number of processors with the
Z80 instruction set flying around and a number of processors early in the
80x86 range as well. When I say flying around I mean that they are part of
avionics systems on current aircraft.

As I said, Z80s have 8 bit registers but int must be 16 bits. Let's keep
that convention. The segmented 80x86, where non-address size ints do make
some sort of sense, was an atrocious design for that very reason. In
practise the problem was solved on PCs at least by extending the language.
It is now widely acknowledged that flat architectures are better.

Malcolm McLean · Oct 21, 2007

James Kuyper Jr. said:
Malcolm said:

Basically you are saying the paradigm

int i;

for(i=0;i<N;i++)
array = x;

ought to be allowed to break down if 16 or 32 bit operations are faster
on machines with 32 or 64 bit address spaces.

Click to expand...

I'm confused by your example, and it's supposed connection to what I said.
Without definitions for N, array, and x, I'm left to assume that they all
have reasonable definitions. As long as N <= INT_MAX, and assuming that
array is defined as having at least N elements, and 'x' has a value that
can safely be converted to the type of array, I see no way to interpret
what I said as endorsing failure of that loop on such machines.

The point is that N isn't something we can control. It is given to us as "an
integral type" that counts the number of items in the array. However we know
that N must fit in memory, or else our computer isn't powerful enough to
handle the problem.
So really N needs to be a size_t, and i needs to be a size_t as well, and
vast swathes of C code are obsolete or subtly broken. Unless you specify
that int shall be able to address any array.

If the language were re-defined from scratch, I'd endorse making the
size-named types which were added in C99 fundamental types, preferably
with a nicer naming convention.

Click to expand...

Exactly. We need a size_t. But it needs to be signed - the advantages
outweight that extra bit. And it needs a nicer name. Preferable a
three-letter one that suggests an arbitrary integer.
And we don't even need to change a word of the standard to achieve this.

Flash Gordon · Oct 21, 2007

Malcolm McLean wrote, On 21/10/07 08:07:

It only applies to embedded machines where the address space is greater
than the register size. I can think of an obvious example - the typical
8 bit scene. However here the standard has spoken, and in my favour. int
must be 16 bits on such machines.

I was not. It applies to plenty of 16 bit processors as well.

However granted that there are some such machines - int needs replacing
with short one of these two hold. Either the extra instructions to
calculate a 32 bit result produce unacceptable performance, or the space
taken is unacceptable. There will be a few cases.

The space can be unacceptably large on the latest 64 bit computers as
has been mentioned to you already.

However if I am to pay for the rewriting I also want the economic
benefits of decreased costs and easier integration and reuse.

You can have all the benefit it would give me. I've included the exact
amount of money in this post. Yes, that is correct, there is exactly no
money in this post.

My namesake, Malcom (sic) McLean introduced containerised shipping. You
would have been the first to say "but Mr McLean, not all goods fit
easily into containers. Are you going to pay for all that hold space
wasted as ships sail around with half-filled containers?". It is an
inefficiency, but actually he revolutionised the cargo transport
industry, simply by increasing the ease of handling. Every container
fits every crane, every lorry and every railway truck, because there is
only one size.

Now try and put one of those containers in the back of a transit or on
the back of a couriers bike. In either case you will find it does not work.

Assuming that you have a pattern of uniform cache usage. If you use 20%
of the cache 80% of the time, you will slow things down by about 10% by
halving effective cache size. Its a cost, but not the massive hit you
are suggesting.

No, it is not a massive cost like I suggested. Sometimes it is far
larger. For example, you have to go to the next smaller gate size which
costs 4 times as much or more because it is new. Or the higher capacity
may simply not be available thus killing the project.

Just because *you* do not approach the limits does not mean that others
are not. There are plenty of situations where people are working at the
limits.

Oh, and has also been pointed out the increased power consumption or
heat dissipation may not be acceptable (important considerations in
*lots* of products, including notebook PCs). That reminds me, we should
set the greens on well since you are advocating needlessly increasing
energy consumption.

It is no part of my case that the people who disagree with me are stupid.

Then go and try and convince them. If you succeed in convincing us it
will not change the situation, if you convince them it will.

As I said, Z80s have 8 bit registers but int must be 16 bits. Let's keep
that convention. The segmented 80x86, where non-address size ints do
make some sort of sense, was an atrocious design for that very reason.

Or the 68000 which was an excellent design with a flat memory space a 16
bit ALU and larger than 16 bit address space. There are plenty of other
example.

In practise the problem was solved on PCs at least by extending the
language. It is now widely acknowledged that flat architectures are better.

You hit the same problem with flat architectures. It has not been
uncommon for address registers to be wider than the ALU.

James Kuyper Jr. · Oct 21, 2007

Malcolm said:
James Kuyper Jr. said:

Malcolm said:

Basically you are saying the paradigm

int i;

for(i=0;i<N;i++)
array = x;

ought to be allowed to break down if 16 or 32 bit operations are
faster on machines with 32 or 64 bit address spaces.

Click to expand...

I'm confused by your example, and it's supposed connection to what I
said. Without definitions for N, array, and x, I'm left to assume that
they all have reasonable definitions. As long as N <= INT_MAX, and
assuming that array is defined as having at least N elements, and 'x'
has a value that can safely be converted to the type of array, I
see no way to interpret what I said as endorsing failure of that loop
on such machines.

Click to expand...

The point is that N isn't something we can control. It is given to us as
"an integral type" that counts the number of items in the array. ...

As I said, you didn't define N, so I had no idea whether or not it was
under your control. Even if N is not under your control, whether your
code enter that loop with a dangerous value of N is under your control:

if(N > INT_MAX || N > ELEMENTS(array))
{
// Error handling
}
else
{
// Loop code
}

There are, of course, other, more elegant ways of ensuring that the loop
is not entered with a dangerous value of N (for instance, the array
could be declared with a length of N), but if your code is doing nothing
to prevent that, it's poorly designed; it wouldn't pass code review in
my shop.

... However
we know that N must fit in memory, or else our computer isn't powerful
enough to handle the problem.
So really N needs to be a size_t, and i needs to be a size_t as well,
and vast swathes of C code are obsolete or subtly broken. Unless you
specify that int shall be able to address any array. ....
Exactly. We need a size_t. But it needs to be signed - the advantages
outweight that extra bit. And it needs a nicer name. Preferable a
three-letter one that suggests an arbitrary integer.
And we don't even need to change a word of the standard to achieve this.

Click to expand...

How could you add a requirement not currently in the standard without
changing a word of it?

I presume the three letter word you're suggesting is 'int'. The
standard's current specification for 'int' is that it "... has the
natural size suggested by the architecture of the execution environment
....". Historically, and probably also in the future, the natural size on
some machines has not been one which could meet your requirement, and
there's a lot of code out there which assumes that 'int' is indeed the
"natural size". I, for instance, have been writing such code for about
30 years now. Therefore, for the sake of backwards compatibility I would
oppose any requirement that would prohibit it from being the natural
size on those machines.

It's not as if having to use a typedef for a type which meets your
requirement would be a major problem for those programmers who wish to
use it.

Malcolm McLean · Oct 21, 2007

James Kuyper Jr. said:
I presume the three letter word you're suggesting is 'int'. The standard's
current specification for 'int' is that it "... has the natural size
suggested by the architecture of the execution environment ...".
Historically, and probably also in the future, the natural size on some
machines has not been one which could meet your requirement, and there's a
lot of code out there which assumes that 'int' is indeed the "natural
size". I, for instance, have been writing such code for about 30 years
now. Therefore, for the sake of backwards compatibility I would oppose any
requirement that would prohibit it from being the natural size on those
machines.

It's not as if having to use a typedef for a type which meets your
requirement would be a major problem for those programmers who wish to use
it.

The "natural size" is either the size of a register or the size of the
address bus. Usually the two are identical, but not always.
Where registers are 8 bits the standard has spoken - int is to be the size
of the address bus. I suppose there is some processor out there with 8 bit
registers and 32 bits of memory space to falsify me here, but basically 256
bytes of memory isn't enough for most programs, whilst 65,536 is generally
adequate for a small problem that doesn't demand a fast processor.

The problem comes with 64 bit machines. Is the natural size 32 bits, which
will be enough for most purposes, and might be a bit faster, or 64 bits?

I'd say that if you allow objects of over 4GB, then the natural integer size
is 64 bits. Integers are usually used either to index arrays or to count
them. Not every integer, obviously, but the great majority. So there's a
need for a type thast is able to index any array that caller might throw at
it.
However there is also a need for a 32 bit type to help out the
micro-optimiser. Integers count things, and most things come in blocks of
substantially less than 2 billion.
So we do actually need a change to the standard. A type for arbitrary array
index operations, and a type that is guaranteed to be fast. But the majority
of use will be the first type. Normally it is more important that software
fits together than that routines squeeze the last drop of efficiency out of
the CPU. Again not always, but normally. Projects fail because code becomes
unmanageable, not usually because the processor isn't fast enough.

santosh · Oct 21, 2007

Malcolm said:
I suppose there is some processor out there
with 8 bit registers and 32 bits of memory space to falsify me here,

Well then, use size_t or long. What's the problem?

The problem comes with 64 bit machines. Is the natural size 32 bits,
which will be enough for most purposes, and might be a bit faster, or
64 bits?

Whatever the processor manufacturer decides to implement. Neither is
intrinsically "natural." It is what you define it to be.

I'd say that if you allow objects of over 4GB, then the natural
integer size is 64 bits.

Which it is on most 64 bit systems.

So there's a need for a type thast is able to index any
array that caller might throw at it.

That type in C is size_t, is it not?

However there is also a need for a 32 bit type to help out the
micro-optimiser. Integers count things, and most things come in blocks
of substantially less than 2 billion.

Well nearly all modern platforms have a native 32 bit type.

So we do actually need a change to the standard.

No need, IMO.

A type for arbitrary array index operations,

size_t or intmax_t.

and a type that is guaranteed to be fast.
int

But the majority of use will be the first type.

No it really would depend on the code and the actual array being
indexed. If you know that your array will be less than INT_MAX
elements, you don't need anything more than int. To be sure to address
4Gb use unsigned long. Anything more, you can use size_t or long long
etc.

Projects fail because code becomes unmanageable, not usually
because the processor isn't fast enough.

Yes, but somehow I doubt that the chief culprit for that unmanageability
is the use, or misuse, of types.

James Kuyper Jr. · Oct 21, 2007

Malcolm McLean wrote:
....

So we do actually need a change to the standard. A type for arbitrary
array index operations, and a type that is guaranteed to be fast. But
> the majority of use will be the first type.

That's not been my experience.

In my experience, arbitrary array indexing is an extremely rare need. In
most contexts I've ever had to worry about, I knew very precisely a
maximum size for every dimension of the array I was indexing. In most of
those cases that maximum was substantially less than 32767, so 'int' was
more than sufficient. When that wasn't the case, in C90 I would use
'long'; in C99, I'd use int_fast32_t. I've never had reason to index
anything where the index could be larger than 2GB.

On the other hand, almost every time I need an integer type that wasn't
fixed externally by a file or interface specification, I wanted the
fastest type available of sufficient size.
YMMV.

Keith Thompson · Oct 22, 2007

santosh said:
Malcolm McLean wrote: [...]

I'd say that if you allow objects of over 4GB, then the natural
integer size is 64 bits.

Click to expand...

Which it is on most 64 bit systems.

Assuming that Malcolm meant "int" rather than "integer", most 64-bit
systems I've seen have 32-bit int and 64-bit long (x86_64, ia-64,
64-bit SPARC).

If int is 64 bits and char is 8 bits, then either there's no 16-bit
integer type or there's no 32-bit integer type (unless the
implementation has C99-style extended integer types but I've never
seen one that does). That's usually considered too high a price to
pay for the benefit of making int the "natural" size. In any case, I
think 32-bit operations are reasonably fast on such systems anyway.

Malcolm McLean · Oct 22, 2007

James Kuyper Jr. said:
Malcolm McLean wrote:
...

That's not been my experience.

In my experience, arbitrary array indexing is an extremely rare need. In
most contexts I've ever had to worry about, I knew very precisely a
maximum size for every dimension of the array I was indexing. In most of
those cases that maximum was substantially less than 32767, so 'int' was
more than sufficient. When that wasn't the case, in C90 I would use
'long'; in C99, I'd use int_fast32_t. I've never had reason to index
anything where the index could be larger than 2GB.

On the other hand, almost every time I need an integer type that wasn't
fixed externally by a file or interface specification, I wanted the
fastest type available of sufficient size.
YMMV.

I don't know what sort of programs you write.
Most of mine use arrays as by far the most common data structure. Typically
arrays represent lists of objects that are either entered by the user, or
decided at a high level.
So for instance if I need to perform an operation - say calculating the
standard deviation of a set of numbers, the stddev() function won't be privy
to the size of data. There will a legitimate expectation that the function
can handle any list of double that will fit into the computer's memory. The
actual list might be entered by the user or be hardcoded by the calling
programmer, and there might be some natural limit - such as the number of
people in the world, or the number of letters in the alphabet, or the number
of atom types in a protein - but the function knows nothing of that.

so the protoype needs to be

double stddev(double *x, type N)

and this is typical. Virtually all functions need to be specified in this
way. The question is what "type" should be called.

santosh · Oct 22, 2007

Malcolm McLean wrote:

So for instance if I need to perform an operation - say calculating
the standard deviation of a set of numbers, the stddev() function
won't be privy to the size of data. There will a legitimate
expectation that the function can handle any list of double that will
fit into the computer's memory. The actual list might be entered by
the user or be hardcoded by the calling programmer, and there might be
some natural limit - such as the number of people in the world, or the
number of letters in the alphabet, or the number of atom types in a
protein - but the function knows nothing of that.

so the protoype needs to be

double stddev(double *x, type N)

and this is typical. Virtually all functions need to be specified in
this way. The question is what "type" should be called.

Surely size_t is tailormade for this purpose?

pete · Oct 22, 2007

santosh said:
Malcolm McLean wrote:

Surely size_t is tailormade for this purpose?

Since he already explained that by "list", he meant "array",
size_t is definitely the right choice.

Malcolm McLean · Oct 23, 2007

santosh said:
Surely size_t is tailormade for this purpose?

Yes. There are two main snags.
Firstly it is unsigned. Although array indicies are naturally positive,
intermediate calculations can produce negative values.
Secondly it is called size_t. If I was supreme ruler of the universe I could
force everyone to use it, but I'm not, and there's just no way you are going
to get consistent usage of a type called "size_t" for an index variable.

Ben Pfaff · Oct 23, 2007

Malcolm McLean said:
Secondly it is called size_t. If I was supreme ruler of the universe I
could force everyone to use it, but I'm not, and there's just no way
you are going to get consistent usage of a type called "size_t" for an
index variable.

What's wrong with the name size_t?

Ben Pfaff · Oct 23, 2007

Malcolm McLean said:
so the protoype needs to be

double stddev(double *x, type N)

and this is typical. Virtually all functions need to be specified in
this way. The question is what "type" should be called.

In my experience this is often still not abstract enough, and
will eventually get replaced by:

void stddev_start(struct stddev_state *);
void stddev_put(struct stddev_state *, double input);
double stddev_finish(struct stddev_state *);

or something even more abstract.

Fibonacci	0	May 13, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
C language. work with text	3	Dec 9, 2021
code review	26	Feb 6, 2004
Can't solve problems! please Help	0	Sep 26, 2022
compressing charatcers	35	Apr 2, 2014
Strange bug	65	Nov 19, 2010
K&R exercise 5-5	10	Feb 19, 2007

Bug/Gross InEfficiency in HeathField's fgetline program

Richard Heathfield

Richard Heathfield

Richard Heathfield

Richard Heathfield

Tor Rustad

Tor Rustad

Malcolm McLean

Malcolm McLean

Flash Gordon

James Kuyper Jr.

Malcolm McLean

santosh

James Kuyper Jr.

Keith Thompson

Malcolm McLean

santosh

pete

Malcolm McLean

Ben Pfaff

Ben Pfaff

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads