Bug/Gross InEfficiency in HeathField's fgetline program

user923005 · Oct 23, 2007

In my experience this is often still not abstract enough, and
will eventually get replaced by:

void stddev_start(struct stddev_state *);
void stddev_put(struct stddev_state *, double input);

I would recommend:
void stddev_put(struct stddev_state *, Number input);

where Number is a typedef somewhere.

It's not as useful in C as in C++, where 'Number' can be an extended
precision class, but at least it will work transparently on float,
double, and long double.

double stddev_finish(struct stddev_state *);

or something even more abstract.
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa67f6aaa,0xaa9aa9f6,0x11f6},*p
=b,i=24;for(;p+=!*p;*p/=4)switch(0[p]&3)case 0:{return 0;for(p--;i--;i--)case+
2:{i++;if(i)break;else default:continue;if(0)case 1utchar(a[i&15]);break;}}}

Kelsey Bjarnason · Oct 24, 2007

[snips]

My namesake, Malcom (sic) McLean introduced containerised shipping. You
would have been the first to say "but Mr McLean, not all goods fit easily
into containers. Are you going to pay for all that hold space wasted as
ships sail around with half-filled containers?". It is an inefficiency, but
actually he revolutionised the cargo transport industry, simply by
increasing the ease of handling. Every container fits every crane, every
lorry and every railway truck, because there is only one size.

An interesting little notion.

Having worked loading and unloading those very same container trucks, an
observation arises. Much freight is loaded on skids, chucked into (or
hauled out of) those containers in largish chunks - 150 boxes per skid,
one trip with a forklift to load or unload, total time maybe a minute.

However...

A lot of freight is not loaded on skids. I can't count the number of
times I loaded or unloaded trucks where freight which actually was on
skids on the dock got broken down and loaded onto the truck one box at a
time.

There's a reason for this: the cost for the time spent loading and
unloading the items piecemeal is significantly less than the cost incurred
by the space wasted by loading skids.

More precisely, in order to load skids, there has to be some room above
the skid and to each side, or you can't get the skid in or out. This
space, after the skid is in the truck, is dead space. The skid itself
adds more dead space - about four inches vertical space.

So put that in perspective. In the space where you would have four skids
- two across, two high - each consisting of perhaps 150 boxes, you can now
get an extra 30-60 boxes; if each box "earns" $10, that's $300-$600 extra,
far more than the cost to load and unload - and that's just the first four
feet of the container; there's still 44 or so feet to go. Using the slack
storage of the skids ends up costing you $3,600 to $7,200 per container -
and that's assuming "earnings" per box is a measly $10.

Your notion is, in essence, to use skids everywhere - the largest possible
unit of management. This may lead to _fast_ loading and unloading, but it
is hellishly wasteful of space, and space costs money - whether in a
container or in silicon.

What you're asking, in essence, is that the consumer eat the cost of the
$7200 per container due to inefficient loading, simply to let you load and
unload only with skids. I'm sure this would make _you_ happy, as you
could load and unload more efficiently, but why should someone else pay
the costs of your increased efficiency?

Now, if you're willing to pay the costs, that's different. If you're
willing to say that for every container shipped, you'll pay the $7200 in
wasted space, or for every embedded system, you'll pay the extra $25 in
additional storage, then fine, let's go to it.

If not, then we're left with the basic proposition of you wanting others
to pay significantly increased costs for no benefit to them, just to make
_you_ happy.

This strikes me as not terribly likely to happen.

Kelsey Bjarnason · Oct 24, 2007

[snips]

Yes. There are two main snags.

Firstly it is unsigned.

Which makes sense, given that it is intended to be used to hold sizes,
which can never be negative, and indexes which, likewise, can never be
negative.

Although array indicies are naturally positive,
intermediate calculations can produce negative values.

Damned rarely, IME.

Secondly it is called size_t.

Yes, as differentiated from int or long, etc. It could have been called
"u_index_t", but it wasn't.

santosh · Oct 24, 2007

Malcolm said:
Yes. There are two main snags.
Firstly it is unsigned. Although array indicies are naturally
positive, intermediate calculations can produce negative values.

This only happens occasionally. Maybe in those few cases you can use
something like intmax_t or int_fast64_t or long long?

Secondly it is called size_t. If I was supreme ruler of the universe I
could force everyone to use it, but I'm not, and there's just no way
you are going to get consistent usage of a type called "size_t" for an
index variable.

Do this then:

typedef size_t YOUR_CHOSEN_NAME;

Malcolm McLean · Oct 24, 2007

Kelsey Bjarnason said:
Your notion is, in essence, to use skids everywhere - the largest possible
unit of management. This may lead to _fast_ loading and unloading, but it
is hellishly wasteful of space, and space costs money - whether in a
container or in silicon.

What you're asking, in essence, is that the consumer eat the cost of the
$7200 per container due to inefficient loading, simply to let you load and
unload only with skids. I'm sure this would make _you_ happy, as you
could load and unload more efficiently, but why should someone else pay
the costs of your increased efficiency?

Malcom McLean's containers were a hit. Every commentator acknowledges that
they have massively reduced the cost of shipping. However they are not used
absolutely everywhere. Cars, for instance, are not typically packed into
containers. Neither is oil.

The sums do have to add up. However big software projects do fail,a nd often
expensively, and the reason is almost always the complexity of the programs.
The processors are typically physically capable of executing the
calcualations fast enough. Hardware costs seldom break projects.

One reason software is too complex is that there are too many standards for
representing data. Functions that do essentially the same thing are written
in different ways, pull in huge lists of dependencies, and need to be
rewritten before being included in projects.
So by reducing the number of types we are going in the right direction, and
attacking the bottleneck. Whether the costs will outweigh the benefits is
difficult to prove rigorously, of course, but history suggests that they
will. "You pay the costs personally" is an infantile argument.

pete · Oct 24, 2007

Kelsey said:
[snips]

Yes. There are two main snags.

Click to expand...

Firstly it is unsigned.

Click to expand...

Which makes sense, given that it is intended to be used to hold sizes,
which can never be negative, and indexes which, likewise, can never be
negative.

Although array indicies are naturally positive,
intermediate calculations can produce negative values.

Click to expand...

Damned rarely, IME.

I don't see what difference the negativity of
intermediate calculations makes anyway.

(0u - 10 + 15) is an expression of type unsigned,
with a value of 5u.

Richard Bos · Oct 25, 2007

Malcolm McLean said:
Malcom McLean's containers were a hit. Every commentator acknowledges that
they have massively reduced the cost of shipping.

Keep on dreaming, Malcolm; but please do so in your sleep, not in
comp.lang.c.

Richard

Malcolm McLean · Oct 25, 2007

pete said:
I don't see what difference the negativity of
intermediate calculations makes anyway.

(0u - 10 + 15) is an expression of type unsigned,
with a value of 5u.

If only C guaranteed you an overflow error in such a situation.
Yes it will work. And that's the problem. Eventually if you play that sort
of game the language will bite you.

Charlie Gordon · Oct 26, 2007

Flash Gordon said:
Charlie Gordon wrote, On 20/10/07 13:54:

Except where it is the right tool for the job. If an application developer
had made a slightly different choice, namely 0 padding instead of space
padding, on the application I spend a lot of time working on then it would
make *far* more calls to strncpy than calls strcpy. If I had the time I
would actually make that change to the SW.

I assume you wrote your own utility functions for space padded fixed width
non zero terminated char arrays. That was quite easy, was it not?
If the application developper had made the slightly different choice of 0
padding, you would just as easily have written the appropriate utility
functions. IMHO, it would be far less confusing to newbie readers of your
code then to have used strncpy.

Instead, my response in terms of strncpy and its safety aspects would have
been to ask the question, "Do you have fixed width 0 padded and
potentially not 0 terminated fields in the your SW?" If the answer was
"no" *then* I would say that almost any use of it in that SW would be
wrong and a security risk because it does not null terminate the result in
all conditions and in other conditions can take an unexpectedly long time.

Definitely so!

If the answer was "yes" then I would say the risk was in erroneously using
it when the intent is to produce a C string rather than populate such a
fields and that naming conventions should be used to assist in spotting
correct vs incorrect usage. Further mitigation would include ensuring that
such limits are tested and the usual stuff about reviews and use of grep.

Using a specific function with a more appropriate name is a good start to
avoid the confusion strncpy is likely to bring along.

Not quite, there are *some* programs that can be written without pointers,
just not many.

The only standard way for a C program to do I/O is to rely on streams
provided by stdio.h, which make use of pointers. Same for command line
arguments: argv is an array of pointers. Actually all arrays convert to
pointers as soon as they are used in expressions (except as the operand to
sizeof).
A program that does not use pointers at all cannot take form of variable
input, and can only produce a return or exit code, namely EXIT_SUCCESS or
EXIT_FAILURE (or 0, but that may be the value of either of these).
There is still a wide variety of programs that can be written this way to
just produce a yes/no answer... but an infinitely small fraction of them
all.

I've been fortunate and only worked with a few clueless programmers.

Ambiguous: only a few clueless and many smart ones, or did you restrict your
working environment to just a few programmers, all of them clueless, and
consider that fortunate ;-?

Charlie Gordon · Oct 26, 2007

pete said:
Kelsey said:

[snips]

Surely size_t is tailormade for this purpose?

Click to expand...

Yes. There are two main snags.

Click to expand...

Firstly it is unsigned.

Click to expand...

Which makes sense, given that it is intended to be used to hold sizes,
which can never be negative, and indexes which, likewise, can never be
negative.

Although array indicies are naturally positive,
intermediate calculations can produce negative values.

Click to expand...

Damned rarely, IME.

Click to expand...

I don't see what difference the negativity of
intermediate calculations makes anyway.

(0u - 10 + 15) is an expression of type unsigned,
with a value of 5u.

He was probably thinking of this kind of issue:

index_t find_last(const int *array, index_t len, int val) {
index_t i;
for (i = len - 1; i >= 0; i--) {
if (array == val)
return i;
}
return -1;
}

If index_t is an unsigned type, the loop never ends and big fat undefined
behaviour strikes!
if index_t is signed (such as non-standard ssize_t) the code works just
fine.

santosh · Oct 26, 2007

Charlie said:
pete said:

Kelsey said:

[snips]

On Tue, 23 Oct 2007 22:21:50 +0100, Malcolm McLean wrote:

Surely size_t is tailormade for this purpose?

Yes. There are two main snags.

Firstly it is unsigned.

Which makes sense, given that it is intended to be used to hold
sizes, which can never be negative, and indexes which, likewise, can
never be negative.

Although array indicies are naturally positive,
intermediate calculations can produce negative values.

Damned rarely, IME.

Click to expand...

I don't see what difference the negativity of
intermediate calculations makes anyway.

(0u - 10 + 15) is an expression of type unsigned,
with a value of 5u.

Click to expand...

He was probably thinking of this kind of issue:

index_t find_last(const int *array, index_t len, int val) {
index_t i;
for (i = len - 1; i >= 0; i--) {
if (array == val)
return i;
}
return -1;
}

If index_t is an unsigned type, the loop never ends and big fat
undefined behaviour strikes!
if index_t is signed (such as non-standard ssize_t) the code works
just fine.

If you're really enamoured of this form of loop control then long should
do, or if you need to address above 4Gb then long long.

Charlie Gordon · Oct 26, 2007

santosh said:
Charlie said:

pete said:

Kelsey Bjarnason wrote:

[snips]

On Tue, 23 Oct 2007 22:21:50 +0100, Malcolm McLean wrote:

Surely size_t is tailormade for this purpose?

Yes. There are two main snags.

Firstly it is unsigned.

Which makes sense, given that it is intended to be used to hold
sizes, which can never be negative, and indexes which, likewise, can
never be negative.

Although array indicies are naturally positive,
intermediate calculations can produce negative values.

Damned rarely, IME.

I don't see what difference the negativity of
intermediate calculations makes anyway.

(0u - 10 + 15) is an expression of type unsigned,
with a value of 5u.

Click to expand...

He was probably thinking of this kind of issue:

index_t find_last(const int *array, index_t len, int val) {
index_t i;
for (i = len - 1; i >= 0; i--) {
if (array == val)
return i;
}
return -1;
}

If index_t is an unsigned type, the loop never ends and big fat
undefined behaviour strikes!
if index_t is signed (such as non-standard ssize_t) the code works
just fine.

Click to expand...

If you're really enamoured of this form of loop control then long should
do, or if you need to address above 4Gb then long long.

I am not "enamoured" with this kind of loop, I was just giving an example of
problems that arise when using unsigned types because of the discontinuity
at zero.
I would probably use ssize_t and define it if not available on the target
architecture.
The loop should be written as follows:

index_t find_last(const int *array, index_t len, int val) {
index_t i;
for (i = len; i-- > 0; ) {
if (array == val)
return i;
}
return -1;
}

Ben Bacarisse · Oct 26, 2007

Malcolm McLean said:
If only C guaranteed you an overflow error in such a situation.

<snip>

The list of things you like in C is looking smaller and smaller.
Also, the list of things you want all programs to pay the cost of is
getting longer. To paraphrase Oscar Wilde: "A language cynic knows
the cost of everything and the value of nothing". One of C's main
advantages is the fact that, by leaving so much unspecified, compilers
can generate efficient code on a wide range of hardware.

The cost is, of course, more care required when writing the code, but
other languages are usually available that have made a different
contract between programmer and hardware.

pete · Oct 26, 2007

Charlie said:
santosh said:

Charlie said:

"pete" <[email protected]> a écrit dans le message de (e-mail address removed)...
Kelsey Bjarnason wrote:

[snips]

On Tue, 23 Oct 2007 22:21:50 +0100, Malcolm McLean wrote:

Surely size_t is tailormade for this purpose?

Yes. There are two main snags.

Firstly it is unsigned.

Which makes sense, given that it is intended to be used to hold
sizes, which can never be negative,
and indexes which, likewise, can never be negative.

Although array indicies are naturally positive,
intermediate calculations can produce negative values.

Damned rarely, IME.

I don't see what difference the negativity of
intermediate calculations makes anyway.

(0u - 10 + 15) is an expression of type unsigned,
with a value of 5u.

He was probably thinking of this kind of issue:

index_t find_last(const int *array, index_t len, int val) {
index_t i;
for (i = len - 1; i >= 0; i--) {
if (array == val)
return i;
}
return -1;
}

Click to expand...

Click to expand...

The loop should be written as follows:

index_t find_last(const int *array, index_t len, int val) {
index_t i;
for (i = len; i-- > 0; ) {
if (array == val)
return i;
}
return -1;
}

Click to expand...

Even Malcolm McLean, who doesn't like size_t,
knows how to use size_t *that* well.

http://groups.google.com/group/comp.lang.c/msg/85fac381c99339e3

Malcolm said:

"christian.bau" <[email protected]>
wrote in message

Click to expand...

size_t i = strlen(s);
while(i--)
{
/* loop body */
}

Click to expand...

But christian.bau's post was disturbing.

Malcolm McLean · Oct 27, 2007

Ben Pfaff said:
What's wrong with the name size_t?

The underscore is unacceptable in something so fundamental.
The name looks like it ought to hold an amount of memory in bytes. In fact
that was the oriignal intention. But actually only a tiny minority of
size_ts are used for that. You need it every time you use an array whose
maximum dimension you cannot control, for both the count and the index.
Which in fact is almost every integer.

So Flash's cache usage objection to 64 bit int actually vanishes, unless we
are writing a mishmash of int and size_t indexed code which risks breaking
when the two sizes diverge. You cannot save on cache space by changing the
spelling of a type. What you can do is discourage good practise, and make it
harder to fit code together.

santosh · Oct 27, 2007

Malcolm said:
The underscore is unacceptable in something so fundamental.

That's just a style consideration.

The name looks like it ought to hold an amount of memory in bytes. In
fact that was the oriignal intention.
Yes.

But actually only a tiny minority of size_ts are used for that. You
need it every time you use an array whose maximum dimension you cannot
control, for both the count and the index.

IMHO, both uses are related. Since array indices are a subset of the
array's size, size_t is just as appropriate for both purposes.

Which in fact is almost every integer.

Maybe this is true in the sort of programming you do, but there are many
programs where integers are needed for arithmetic far more than
indexing.

<snip>

Malcolm McLean · Oct 27, 2007

santosh said:
Maybe this is true in the sort of programming you do, but there are many
programs where integers are needed for arithmetic far more than
indexing.

I'm pretty sure that's a perception rather than a reality. Let's say you are
storing a list of amounts of money as integers. You write a routine to take
the average. When asked "what does this routine do" you will answer "it adds
up an amount of money, divides by the count, and reports it."
Actually that's not what it does.

int average(int *money, size_t N)
{
size_t i;
int total = 0;

for(i=0;i<N;i++)
total += money;
return total/N;
}

Whilst there are two operations that operate on int (total =0, total +=),
there are four which operate on i, and one which operates on total and N.
Whether you count C instructions or machine operations, what the function is
mainly doing is calculating a list of indices.
However the programmer's perception will be that he is mainly working with
amounts of money, because that is what is important to his human-centric way
of looking at the routine.

Richard · Oct 27, 2007

Malcolm McLean said:
santosh said:

Maybe this is true in the sort of programming you do, but there are many
programs where integers are needed for arithmetic far more than
indexing.

Click to expand...

I'm pretty sure that's a perception rather than a reality. Let's say
you are storing a list of amounts of money as integers. You write a
routine to take the average. When asked "what does this routine do"
you will answer "it adds up an amount of money, divides by the count,
and reports it."
Actually that's not what it does.

int average(int *money, size_t N)
{
size_t i;
int total = 0;

for(i=0;i<N;i++)
total += money;
return total/N;
}

Whilst there are two operations that operate on int (total =0, total
+=), there are four which operate on i, and one which operates on

No there are not. it depends what the compiler does. You could as easily
make it 1 one off for initilisation and one for the loop and check

int i=N;
while(i--)
total += money;

total and N. Whether you count C instructions or machine operations,
what the function is mainly doing is calculating a list of indices.

Click to expand...

Its not mainly doing that at all. It is mainly accessing and adding up numbers.

However the programmer's perception will be that he is mainly working
with amounts of money, because that is what is important to his
human-centric way of looking at the routine.

Click to expand...

You confuse me by trying to think too much into it.

This function adds a set integers together and then returns the average.

Charlie Gordon · Oct 27, 2007

Richard said:
Malcolm McLean said:

santosh said:

Malcolm McLean wrote:

Which in fact is almost every integer.

Maybe this is true in the sort of programming you do, but there are many
programs where integers are needed for arithmetic far more than
indexing.

Click to expand...

I'm pretty sure that's a perception rather than a reality. Let's say
you are storing a list of amounts of money as integers. You write a
routine to take the average. When asked "what does this routine do"
you will answer "it adds up an amount of money, divides by the count,
and reports it."
Actually that's not what it does.

int average(int *money, size_t N)
{
size_t i;
int total = 0;

for(i=0;i<N;i++)
total += money;
return total/N;
}

Whilst there are two operations that operate on int (total =0, total
+=), there are four which operate on i, and one which operates on

Click to expand...

No there are not. it depends what the compiler does. You could as easily
make it 1 one off for initilisation and one for the loop and check

int i=N;
while(i--)
total += money;

total and N. Whether you count C instructions or machine operations,
what the function is mainly doing is calculating a list of indices.

Click to expand...

Its not mainly doing that at all. It is mainly accessing and adding up
numbers.

However the programmer's perception will be that he is mainly working
with amounts of money, because that is what is important to his
human-centric way of looking at the routine.

Click to expand...

You confuse me by trying to think too much into it.

This function adds a set integers together and then returns the average.

That's what it attempts to do, but the calculation is quite likely to cause
integer overflow if the array and/or values are large enough, causing
undefined behaviour. Otherwise, the result is the average of the values of
the array, rounded towards zero. If the size N passed is zero, undefined
behaviour is invoked.

Richard · Oct 27, 2007

Charlie Gordon said:
Richard said:

Malcolm McLean said:

Malcolm McLean wrote:

Which in fact is almost every integer.

Maybe this is true in the sort of programming you do, but there are many
programs where integers are needed for arithmetic far more than
indexing.

I'm pretty sure that's a perception rather than a reality. Let's say
you are storing a list of amounts of money as integers. You write a
routine to take the average. When asked "what does this routine do"
you will answer "it adds up an amount of money, divides by the count,
and reports it."
Actually that's not what it does.

int average(int *money, size_t N)
{
size_t i;
int total = 0;

for(i=0;i<N;i++)
total += money;
return total/N;
}

Whilst there are two operations that operate on int (total =0, total
+=), there are four which operate on i, and one which operates on

Click to expand...

No there are not. it depends what the compiler does. You could as easily
make it 1 one off for initilisation and one for the loop and check

int i=N;
while(i--)
total += money;

total and N. Whether you count C instructions or machine operations,
what the function is mainly doing is calculating a list of indices.

Click to expand...

Its not mainly doing that at all. It is mainly accessing and adding up
numbers.

However the programmer's perception will be that he is mainly working
with amounts of money, because that is what is important to his
human-centric way of looking at the routine.

Click to expand...

You confuse me by trying to think too much into it.

This function adds a set integers together and then returns the average.

Click to expand...

That's what it attempts to do, but the calculation is quite likely to cause
integer overflow if the array and/or values are large enough, causing

Its not quite likely to do anything of the sort if it is operating in
the limits the designer set - otherwise we would use different types.

undefined behaviour. Otherwise, the result is the average of the values of
the array, rounded towards zero. If the size N passed is zero, undefined
behaviour is invoked.

Click to expand...

I'm not sure I understand why you are writing this. This applies to any
and all code posted here and a code review isn't the issue here.

Anytime someone posts a line like

x+=y;

One could make this comment "that addition might provoke undefined
behaviour if the values are too large".

Fibonacci	0	May 13, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
C language. work with text	3	Dec 9, 2021
code review	26	Feb 6, 2004
Can't solve problems! please Help	0	Sep 26, 2022
compressing charatcers	35	Apr 2, 2014
Strange bug	65	Nov 19, 2010
K&R exercise 5-5	10	Feb 19, 2007

Bug/Gross InEfficiency in HeathField's fgetline program

user923005

Kelsey Bjarnason

Kelsey Bjarnason

santosh

Malcolm McLean

pete

Richard Bos

Malcolm McLean

Charlie Gordon

Charlie Gordon

santosh

Charlie Gordon

Ben Bacarisse

pete

Malcolm McLean

santosh

Malcolm McLean

Richard

Charlie Gordon

Richard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads