Bug/Gross InEfficiency in HeathField's fgetline program

  • Thread starter Antoninus Twink
  • Start date
C

Charlie Gordon

Richard said:
Charlie Gordon said:
Richard said:
Malcolm McLean wrote:

Which in fact is almost every integer.

Maybe this is true in the sort of programming you do, but there are
many
programs where integers are needed for arithmetic far more than
indexing.

I'm pretty sure that's a perception rather than a reality. Let's say
you are storing a list of amounts of money as integers. You write a
routine to take the average. When asked "what does this routine do"
you will answer "it adds up an amount of money, divides by the count,
and reports it."
Actually that's not what it does.

int average(int *money, size_t N)
{
size_t i;
int total = 0;

for(i=0;i<N;i++)
total += money;
return total/N;
}

Whilst there are two operations that operate on int (total =0, total
+=), there are four which operate on i, and one which operates on

No there are not. it depends what the compiler does. You could as easily
make it 1 one off for initilisation and one for the loop and check

int i=N;
while(i--)
total += money;

total and N. Whether you count C instructions or machine operations,
what the function is mainly doing is calculating a list of indices.

Its not mainly doing that at all. It is mainly accessing and adding up
numbers.

However the programmer's perception will be that he is mainly working
with amounts of money, because that is what is important to his
human-centric way of looking at the routine.

You confuse me by trying to think too much into it.

This function adds a set integers together and then returns the average.


That's what it attempts to do, but the calculation is quite likely to
cause
integer overflow if the array and/or values are large enough, causing


Its not quite likely to do anything of the sort if it is operating in
the limits the designer set - otherwise we would use different types.
undefined behaviour. Otherwise, the result is the average of the values
of
the array, rounded towards zero. If the size N passed is zero, undefined
behaviour is invoked.

I'm not sure I understand why you are writing this. This applies to any
and all code posted here and a code review isn't the issue here.

Anytime someone posts a line like

x+=y;

One could make this comment "that addition might provoke undefined
behaviour if the values are too large".


It is a matter of common sense! Summing an array of unknown size is likely
to cause overflow: the programmer may well have been oblivious to this fact.
the array may contain large numbers, this way of computing the average
requires as a pre-condition that the sum be within bounds of an int type.
This condition is non obvious and should at least be stated in a comment.

Regarding the division by zero, it is a classic bug is this kind of
function. A simple test prevents undefined (and quite likely catastrophic)
behaviour for the case N == 0. Calling this function with N == 0 should be
allowed: N is the number of values to average, not necessarily the size of
the array.

I'm giving a code review for any code posted on c.l.c: I provide advice that
others find useful and informative. I try to use common sense and help
posters avoid classic mistakes. Sometimes it involves deterring them from
using certain problematic functions from the standard C library (strtok,
strncpy...), sometimes I try to help people write more readably, or use less
error prone constructs and algorithms. I think my criticism is on average
more constructive than yours.
 
M

Malcolm McLean

Richard said:
Its not quite likely to do anything of the sort if it is operating in
the limits the designer set - otherwise we would use different types.
In reality we would use a floating point rather than an integer type. But
that wasn't the point I was trying to make.
 
K

Keith Thompson

Malcolm McLean said:
santosh said:
Maybe this is true in the sort of programming you do, but there are many
programs where integers are needed for arithmetic far more than
indexing.
I'm pretty sure that's a perception rather than a reality. Let's say
you are storing a list of amounts of money as integers. You write a
routine to take the average. When asked "what does this routine do"
you will answer "it adds up an amount of money, divides by the count,
and reports it."
Actually that's not what it does.

int average(int *money, size_t N)
{
size_t i;
int total = 0;

for(i=0;i<N;i++)
total += money;
return total/N;
}

[...]

Not all data is stored in arrays. Depending on the application, a
significant amount of data might be stored in, say, trees or linked
lists.
 
F

Flash Gordon

Charlie Gordon wrote, On 26/10/07 05:19:
I assume you wrote your own utility functions for space padded fixed width
non zero terminated char arrays. That was quite easy, was it not?

I didn't, the original developer did. I agree it was easy though.
If the application developper had made the slightly different choice of 0
padding, you would just as easily have written the appropriate utility
functions. IMHO, it would be far less confusing to newbie readers of your
code then to have used strncpy.

There is so much complexity in the code that use of strncpy would be the
least of a newbies problems.

I would use strncpy because it *might* be optimised by the
implementation, it gets called a significant amount (well, the
equivalent for what we actually use does) and we target several
different systems so optimising it ourselves could be a pain.

Using a specific function with a more appropriate name is a good start to
avoid the confusion strncpy is likely to bring along.

Well, in our own library header I suppose we could do
#define dbfieldcpy(dest,src,size) strncpy(dest,src,size)

Actually, since some of the code pre-dates the ANSI standard there was a
customer written implementation of what was effectively memmove
(although I found a bug in it) which I have replaced with just such a
#define.
The only standard way for a C program to do I/O is to rely on streams
provided by stdio.h, which make use of pointers. Same for command line
arguments: argv is an array of pointers. Actually all arrays convert to
pointers as soon as they are used in expressions (except as the operand to
sizeof).
A program that does not use pointers at all cannot take form of variable
input, and can only produce a return or exit code, namely EXIT_SUCCESS or
EXIT_FAILURE (or 0, but that may be the value of either of these).
There is still a wide variety of programs that can be written this way to
just produce a yes/no answer... but an infinitely small fraction of them
all.

A very useful couple of programs in unix are true and false. One returns
success the other returns failure and they do nothing else. These meet
all of the limitations you specify :)
Ambiguous: only a few clueless and many smart ones, or did you restrict your
working environment to just a few programmers, all of them clueless, and
consider that fortunate ;-?

Only a few programmers I've worked with have been clueless, most have
been good or better.
 
C

Charlie Gordon

Flash Gordon said:
Charlie Gordon wrote, On 26/10/07 05:19:

There is so much complexity in the code that use of strncpy would be the
least of a newbies problems.

It is so hard to simplify ;-)
Perfection is achieved, not when there is nothing more to add, but when
there is nothing left to take away. Antoine de Saint-Exupery
I would use strncpy because it *might* be optimised by the implementation,
it gets called a significant amount (well, the equivalent for what we
actually use does) and we target several different systems so optimising
it ourselves could be a pain.

OK, as long as you wrap it under a different name.
Well, in our own library header I suppose we could do
#define dbfieldcpy(dest,src,size) strncpy(dest,src,size)

Good. I would use a different order for the arguments:

#define dbfieldcpy(dest,size,src) strncpy(dest,src,size)

because size relates to dest (as in sizeof dest), not src.
Actually, since some of the code pre-dates the ANSI standard there was a
customer written implementation of what was effectively memmove (although
I found a bug in it) which I have replaced with just such a #define.

customer written ?
A very useful couple of programs in unix are true and false. One returns
success the other returns failure and they do nothing else. These meet all
of the limitations you specify :)

The built-in shell versions seem to do nothing else indeed, but the stand
alone utilities parse a couple of command line options, impossible to do
without pointers. Try:

$ /bin/true --help
$ /bin/false --version
Only a few programmers I've worked with have been clueless, most have been
good or better.

Lucky you! Were you ever involved in hiring programmers?
 
K

Kelsey Bjarnason

Malcom McLean's containers were a hit.

Yes, but the containers are not the unit of shipping; they're simply a
package. They're not an int, they're a class containing ints and shorts.
Guess what? Such packages are very useful. As are the things they
contain. Yet for some reason you wish to do away with every sort of box,
skid, tub, tube, the lot, and only use the largest possible item, the
container.

This, needless to say, is ridiculous, and has no relation to the real
world.
 
F

Flash Gordon

Malcolm McLean wrote, On 27/10/07 20:37:
In reality we would use a floating point rather than an integer type.
But that wasn't the point I was trying to make.

On most financial work, and your example was talking about money, you
can only use a floating point number if all your numbers can be exactly
represented in the floating point type (in fact, there are often legal
requirements, such as having to pay people what was agreed and paying
the correct amount of tax). This is why a lot of financial SW uses
integer types, generally int where it will fit. Oh, and the SW often has
hundreds of simultaneous users in the environments I deal with and is IO
bound (disk IO to be specific even when using decent storage), so
doubling the size of the integers would massively and unacceptably slow
it down.

Others have pointed out enough errors in your response to my post so I
won't bother.
 
F

Flash Gordon

Charlie Gordon wrote, On 28/10/07 13:31:
It is so hard to simplify ;-)
Perfection is achieved, not when there is nothing more to add, but when
there is nothing left to take away. Antoine de Saint-Exupery

I do delete code from it. Some months I've deleted more code than I have
written!
OK, as long as you wrap it under a different name.

I would think about it, but I have not decided whether I would. I might
so that I could change the way fields are stored more easily, but
whether I would do it due to purely naming issues is another matter.
Good. I would use a different order for the arguments:

#define dbfieldcpy(dest,size,src) strncpy(dest,src,size)

because size relates to dest (as in sizeof dest), not src.

That is a matter of style. There is also the possibility that people
might assume the order of the parameters is the same as for strncpy and
memcpy so no solution is perfect.
customer written ?

I meant custom written.
The built-in shell versions seem to do nothing else indeed, but the stand
alone utilities parse a couple of command line options, impossible to do
without pointers. Try:

$ /bin/true --help
$ /bin/false --version

Those are the GNU versions not the original Unix versions.
Lucky you! Were you ever involved in hiring programmers?

Only to a very small degree.
 
B

Ben Pfaff

Flash Gordon said:
Oh, and the SW often has hundreds of simultaneous users in the
environments I deal with and is IO bound (disk IO to be
specific even when using decent storage), so doubling the size
of the integers would massively and unacceptably slow it down.

Your software is disk I/O *bandwidth* bound? You must be reading
or writing massive amounts of storage, then. My guess would have
been that it was disk I/O *latency* bound, in which case doubling
the size of the data wouldn't make much difference.
 
M

Malcolm McLean

Ben Pfaff said:
Your software is disk I/O *bandwidth* bound? You must be reading
or writing massive amounts of storage, then. My guess would have
been that it was disk I/O *latency* bound, in which case doubling
the size of the data wouldn't make much difference.
Even in the very worst case you have to be incredibly unlucky to slow down
the program by more than 50%. (I was going to say "can't", but that's not
strictly true - you can construct a scenario in which you thrash the cache
by doubling int size). That might sound a lot to an electrical engineer, but
in software terms it really is on the boundary of a worthwhile
optimisation - if we're not doubling speed at least we aren't really
speeding things up.

However that represents the sever at peak capacity. But if it has got
hundreds of simultaneous users, how often is the system running at peak
capacity?

Doubling int sizes would doubtless have some impact on performace. But not
the sort that is being suggested. Then I don't believe that most of the
calculations are of amounts of money, anyway. I think the system spends most
of its time copying data from one location to another.

It could be farming out memory to disk, however. In which case doubling
memory take would have a significant impact. IO is still largely latency
bound, but the number of writes would double.
 
J

J. J. Farrell

The built-in shell versions seem to do nothing else indeed, but the stand
alone utilities parse a couple of command line options, impossible to do
without pointers. Try:

$ /bin/true --help
$ /bin/false --version

GNU extensions. The POSIX forms don't have command line options and
don't use streams.
 
R

Richard

*Blink*

You go from strength to strength.

Richard

Actually, FWIW, I "kind of" agree with him.

It's not the norm for the standard types. It is misnamed.

What does the _t mean? type? type?

So where is int_t etc?

Consistency.

I don't like it either for just that reason.
 
R

Richard Tobin

Richard said:
What does the _t mean? type? type?

So where is int_t etc?

Well, that's historical reasons for you.

If you said "_t indicates type names created with typedef" you'd have
a rule slightly closer to consistency.

-- Richard
 
R

Richard

Well, that's historical reasons for you.

Yes, it was kind of rhetorical. But frankly, i agree with the other
poster. size_t is ugly despite its reasons.

Frankly I find this "use size_t" advice to be ridiculous most of the
time when its known that the array index is certainly a lot less than
the MAXINT or whatever it is.
 
R

Richard Tobin

Frankly I find this "use size_t" advice to be ridiculous most of the
time when its known that the array index is certainly a lot less than
the MAXINT or whatever it is.

I quite agree. It makes sense to use size_t in libraries for generic
string-handling functions and the like, but if you've got an array of
employees or chess pieces then int is the natural choice.

-- Richard
 
K

Kelsey Bjarnason

In reality we would use a floating point rather than an integer type.

For *monetary* calculations? Remind me never to use any of your stuff to
balance the checkbooks.
 
B

Ben Bacarisse

Richard said:
Frankly I find this "use size_t" advice to be ridiculous most of the
time when its known that the array index is certainly a lot less than
the MAXINT or whatever it is.

Yes, but has anyone actually advised size_t in a case when the array
index is certainly known to be less than INT_MAX?

Another worry is that things one knows to true with certainty about
some code today can be false tomorrow. Someone (I think is was Keith
Thompson) said they "got nervous" about certain constructs (it was bit
operations on signed types) and I thought how useful it is to have a
well developed sense of nervousness in C. 'int' indexes make me a
little nervous. Not a lot, a little.

The '_t' is a bit ugly but what should the committee have done? I've
not seen a good alternative. Malcolm McLean's idea of 64-bit int
everywhere has the wrong cost/benefit profile. The benefits of
liberal use of size_t far outweigh the costs.
 
R

Richard

Ben Bacarisse said:
Yes, but has anyone actually advised size_t in a case when the array
index is certainly known to be less than INT_MAX?

I struggle to think of any time I have written code where an index to a
single block was more that INT_MAX. However, yes, there has been just
that advice. Not surprisingly.
Another worry is that things one knows to true with certainty about
some code today can be false tomorrow. Someone (I think is was Keith
Thompson) said they "got nervous" about certain constructs (it was bit
operations on signed types) and I thought how useful it is to have a
well developed sense of nervousness in C. 'int' indexes make me a
little nervous. Not a lot, a little.

In what way? 0 to N. Nothing to worry about. N is well within the limits
99.9999% of the time. And you need to be aware if it isn't in order to
specifically move to size_t or long or whatever.
The '_t' is a bit ugly but what should the committee have done? I've

Used a different name?
 
K

Keith Thompson

Ben Bacarisse said:
The '_t' is a bit ugly but what should the committee have done? I've
not seen a good alternative. Malcolm McLean's idea of 64-bit int
everywhere has the wrong cost/benefit profile. The benefits of
liberal use of size_t far outweigh the costs.

And there are plenty of other types whose names have the _t suffix:
time_t, int32_t, wchar_t, ptrdiff_t, fpos_t.

If you don't like the "_t" suffix, I'm afraid you're just going to
have to find a different language.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Fibonacci 0
Adding adressing of IPv6 to program 1
C language. work with text 3
code review 26
Can't solve problems! please Help 0
compressing charatcers 35
Strange bug 65
K&R exercise 5-5 10

Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,058
Members
48,769
Latest member
Clifft

Latest Threads

Top