Malcolm's new book

Richard Heathfield · Aug 21, 2007

<snip>

In a long reply, pete said a great deal on which I have no particular
comment to make, but one point certainly bears further discussion:

It depends on whether or not you want to consider the
published function definition as a code fragment.
Code fragments don't always have to be compilable.

Absolutely right. But sometimes they *do*. I want to move straight into
the general case here - and do I hear a sigh of relief from MM?

Code fragments come in all manner of shapes and sizes, and are provided
for all sorts of reasons. I think the smallest (visible) code fragment
I've ever written is <code>.</code> - when describing the structure
member operator. Clearly, that fragment is not intended to be compiled
separately!

Generally, if a code fragment is written inline in a paragraph, it is
unreasonable to expect it to compile. For example, I have recently
written the following in an HTML page: "The declarator, <code>int
main(void)</code>, tells the compiler that <code>main</code> is a
function that returns a value of type <code>int</code>." I do not
expect people to expect that int main(void) will compile all by itself!

When a particular line (or short group of lines) of code is being
discussed, again it is unreasonable to expect it to compile:

"But we can use a <code>for</code> loop to make the job a lot easier:
</p>
<pre><code>
for(i = 0; i < 64; i++)
{
putchar('-');
}
</code></pre>
<p>
This is of course the equivalent of:
</p>
<pre><code>
i = 0;
while(i < 64)
{
putchar('-');
i++;
}
</code></pre>"

If an entire function is given, however, then it /is/ reasonable to
expect it to compile, provided that any relevant information supplied
earlier in the discussion is made available to the compiler (e.g.
headers, declarations for called functions, and so on). The obvious
exception is when we get something like:

int foo(void)
{
auto i = 42; /* perhaps this is what we're trying to talk about */
.
.
.
return i;
}

Here, the three lines of dot tell an eloquent story - "stuff goes here"
- and we recognise that this is a device for generalising a function
that, in its present form, is not intended to be compiled.

All of the above are suitable for expositors, didacts, and exegetes to
employ during their expositions, didacticisms, and exegeses.

When one is seeking help, however ("my program is broken, please help,
here's the code"), it is normally essential to provide a complete,
minimal, compilable program - together with enough information to allow
easy reconstruction of test data.

It is also necessary for authors - not just book authors but also online
tutorial writers - to provide complete, compilable, compilED, and
TESTED versions of the code in their books. If this is not done via
some kind of side-channel (Web site, CD in the back of the book, or
whatever), then it must be done in the book itself. If the side-channel
is employed, the abridged book code should be annotated with an
explanation of where the full version can be found (or a general remark
covering this point should be given near the front of the book).

Did I miss anything obvious?

Richard Heathfield · Aug 21, 2007

Ben Bacarisse said:

Curiously, no! The character has been read and stored, the next one
may be '\n' or fgetc may return EOF so the function is allocating
storage it does not (certainly) need.

Oh, good spot, sir. It seemed a bit upside down to me, but I thought
that was just a harmless style difference between MM and myself. I
should read more closely in future.

pete · Aug 21, 2007

Richard Heathfield wrote:

If an entire function is given, however, then it /is/ reasonable to
expect it to compile, provided that any relevant information supplied
earlier in the discussion is made available to the compiler (e.g.
headers, declarations for called functions, and so on). The obvious
exception is when we get something like:

int foo(void)
{
auto i = 42; /* perhaps this is what we're trying to talk about */
.
.
.
return i;
}

Here, the three lines of dot tell an eloquent story
- "stuff goes here"
- and we recognise that this is a device for generalising a function
that, in its present form, is not intended to be compiled.

All of the above are suitable for expositors, didacts, and exegetes to
employ during their expositions, didacticisms, and exegeses.

When one is seeking help, however ("my program is broken, please help,
here's the code"), it is normally essential to provide a complete,
minimal, compilable program
- together with enough information to allow
easy reconstruction of test data.

It is also necessary for authors
- not just book authors but also online tutorial writers
- to provide complete, compilable, compilED, and
TESTED versions of the code in their books. If this is not done via
some kind of side-channel (Web site, CD in the back of the book, or
whatever), then it must be done in the book itself.
If the side-channel
is employed, the abridged book code should be annotated with an
explanation of where the full version can be found
(or a general remark
covering this point should be given near the front of the book).

Did I miss anything obvious?

No.

Eric Sosman · Aug 21, 2007

Richard said:
Malcolm McLean said:

Look harder. Eric Sosman is better at spotting bugs than you are.

I'm so good I can even spot them when they're not there.

I posted a retraction a few days later, but as a hint rather
than as an outright "I dun rong" message. I suspected that the
code's author was not reading the criticisms carefully, and the
fact that he didn't give me both barrels in the traditional c.l.c.
fashion -- not even after a hint -- seems to support my suspicion.

The error itself I blame on my own decreasing visual acuity,
exacerbated by the author's decision to use a small proportional
font for his code: the ! character is very narrow, and snuggled
concealingly into the embrace of the preceding ( in a way that
made it invisible to my aging eyes.

Let's see: I posted the original erroneous bug report on
August 6 and the veiled retraction on August 11. Now on August
20, two weeks after the original claim and nine days after the
broad hint, the author at last is actually paying attention.
I congratulate him on finally refuting my error, and I hope he
will pay similar attention to the other criticisms I and other
C users have raised. They deserve more attention than it turns
out this one did.

Philip Potter · Aug 21, 2007

Richard said:
pete said:

I'm not a fan of early returns. Neither am I a fan of using a
Boolean-style if() in these circumstances - if(buff == NULL) would be
preferable - but neither is actually wrong, so let's move on.
>

getc would be an improvement over fgetc, and I'd rather see the EOF test
within the loop control, but again, neither of these is strictly a
correction.

Why is getc() an improvement over fgetc()? I've heard this stated but
never explained.

if (ch == EOF) {
if (nread == 0) {
free(buff);
return 0;
}
break;
}
buff[nread] = (char) ch;

Click to expand...

The cast is unnecessary. If the conversion can fail, then the cast won't
stop it failing, so it does no good. If the conversion cannot fail,
then the cast doesn't add any value.

nread++;
if (nread == buffsize - 1) {

Click to expand...

This is correct, but isn't good C. Idiomatically,

if(++nread == buffsize - 1) {

is better C.

Where in the C standard does it say this?

To ask the same thing less confrontationally, why is this better? I have
been using C on and off for a number of years, and I would probably
consider myself "average" and certainly not as good as many regulars on
clc. I can read C expressions such as *to++ = *from++ well because they
are idiomatic; however, the if statement you stated is not so idiomatic
that I recognise it immediately. I therefore find the 2-line version
easier to read.

I can see an argument for being terser when you know only good C
programmers will read your code; and I can even see an argument that if
I find the one-line version harder to read then I should jolly well get
better at reading -- but I'm not (yet) convinced by it.

In any case, I don't think this criticism of MM's book is particularly
noteworthy in amongst some of the more substantial claims being thrown
around.

This code gives up too easily. If you can't get a big wodge of extra
space, try for a smaller wodge. And still a smaller one, in a loop that
decreases the demand in some sensible way. All you really *need* at
this point is one extra character.

Would you necessarily include this extra code in an algorithms book? I
haven't made up my mind; on the one hand, I think a comment afterwards
about this would suffice, but on the other hand if you're reimplementing
fgets() then it's definitely a C task and should be done in an
impeccable C manner.

Phil

pete · Aug 21, 2007

Philip Potter wrote:

Why is getc() an improvement over fgetc()?

It's a speed issue, in practice.

getc is typically implemented as a macro.
fgetc is typically not implemented as a macro.

If fgetc were to be implemented in C code,
it would probably look like this:

int fgetc(FILE *stream)
{
return getc(stream);
}

Philip Potter · Aug 21, 2007

pete said:
It's a speed issue, in practice.

getc is typically implemented as a macro.
fgetc is typically not implemented as a macro.

fgets() is required to be implemented as a function AFAIK.

If fgetc were to be implemented in C code,
it would probably look like this:

int fgetc(FILE *stream)
{
return getc(stream);
}

Ah that's where my confusion comes from. I had assumed that getc would
be implemented in terms of fgetc rather than the other way round. But of
course your version makes much more sense.

Phil

Richard Heathfield · Aug 21, 2007

Philip Potter said:

Richard Heathfield wrote:

Why is getc() an improvement over fgetc()? I've heard this stated but
never explained.

pete has addressed this question already.

Where in the C standard does it say this?

Section 42.4.2$4(2).

To ask the same thing less confrontationally, why is this better?

Well, "better" is in the eye of the beholder, of course, but for me it's
a question of using an idiom versus not using an idiom.

I have been using C on and off for a number of years, and I would
probably consider myself "average" and certainly not as good as many
regulars on clc. I can read C expressions such as *to++ = *from++ well
because they are idiomatic;

Quite so. As a C programmer, one ought to familiarise oneself with the
ideas of pre- and post-increment.

however, the if statement you stated is
not so idiomatic that I recognise it immediately. I therefore find the
2-line version easier to read.

Oddly, I found it harder to read! Still, that's folks for you. Never so
happy as when we're complaining...

In any case, I don't think this criticism of MM's book is particularly
noteworthy in amongst some of the more substantial claims being thrown
around.

Oh, I agree - this is all minor league stuff. But it's the code that was
in front of me at the time. I agree that there wasn't very much wrong
with it.

Would you necessarily include this extra code in an algorithms book?

There is no need to put *any* code in an algorithms book. But if you're
*going* to include the code, include the right code - code that works,
implements the algorithm properly, handles errors appropriately, and is
robust in the face of straitened resource circumstances.

I
haven't made up my mind; on the one hand, I think a comment afterwards
about this would suffice, but on the other hand if you're
reimplementing fgets() then it's definitely a C task and should be
done in an impeccable C manner.

And that's basically what he's trying to do, except that he's trying to
improve on fgets (by making it possible to read arbitrarily long lines
in a single call) - but failing, IMHO. His poor design and poor
implementation doom the attempt from the start.

Philip Potter · Aug 21, 2007

Richard said:
Philip Potter said:

Section 42.4.2$4(2).

Well, "better" is in the eye of the beholder, of course, but for me it's
a question of using an idiom versus not using an idiom.

Idioms are not universal; it will depend on your project team which
idioms are recognised and common, and which are less common and harder
to read. I find your focus on your own idioms as "good C" to be a little
absolutist.

Quite so. As a C programmer, one ought to familiarise oneself with the
ideas of pre- and post-increment.

But there's a difference between knowing what the operators do and being
able to read idiomatic statements using those operators quickly. I can
read *to++ = *from++ quickly not because of my knowledge of precedence
but because of my knowledge of idioms. If I saw
*++to = *++from I would take longer to read it, not because any of the
operators are unfamiliar but because it is less idiomatic and less
common. (The fact that it is unidiomatic is a hint that it is probably
wrong.)

Similarly, when I have trouble reading

if (++n == limit) { /*...*/ }

it is not because I am unfamiliar with preincrement, it is because I am
unfamiliar with this idiom.

Oddly, I found it harder to read! Still, that's folks for you. Never so
happy as when we're complaining...

I don't think we disagree too much here.

Just to be clear, I feel that idioms are in general a good thing which
serve to make code terser and easier to read, provided your readers are
familiar with them.

There is no need to put *any* code in an algorithms book. But if you're
*going* to include the code, include the right code - code that works,
implements the algorithm properly, handles errors appropriately, and is
robust in the face of straitened resource circumstances.

I guess this is all the more reason to advocate pseudocode for algorithm
discussions.

The first formal algorithms course I did tried to show the difference
between iteration and recursion using a functional language with no
looping constructs - therefore, both examples used recursion.

Phil

Richard Heathfield · Aug 21, 2007

Philip Potter said:

Idioms are not universal;

Well said. Nevertheless, the reason that they are idioms is that they
are very common. Were they not very common, they would not be idioms.

it will depend on your project team which
idioms are recognised and common, and which are less common and harder
to read. I find your focus on your own idioms as "good C" to be a
little absolutist.

Well, "your own idioms" is a bit of a stretch - the inclusion of
pre-increment in conditional expressions is far from unique to me. On
the "absolutist" charge, I would agree that I nurse a stylistic
paradigm that is rather less liberal than that used by many people, but
I don't insist that "my" style is good C and theirs is not. When I note
departures from "my" style, I tend to note, too, that this is purely a
matter of style, not of correctness or "goodness".

But there's a difference between knowing what the operators do and
being able to read idiomatic statements using those operators quickly.

Indeed, and there is a trade-off between expressive power (which is
sometimes described as "terseness") and clarity (which is sometimes
described as "verbosity") - and it's sometimes a fine line to tread.

I don't think we disagree too much here.

Oh dear. Perhaps we need to try harder.

Keith Thompson · Aug 21, 2007

Philip Potter said:
fgets() is required to be implemented as a function AFAIK.

[...]

I think you meant fgetc, not fgets.

Nearly all function-like things in the C standard library are required
to be implemented as actual functions (meaning, for example, that you
can take their addresses and call them indirectly).

Any standard function may *in addition* be implemented as a macro, as
long as the macro is well-behaved. Such a macro must evaluate each of
its arguments exactly once, and each reference to an argument must be
fully parenthesized, so it can be used in an expression as if it were
a function. (See C99 7.1.4.)

assert() is an exception to this; it's required to be a macro, not a
function, because it needs to expand __FILE__ and __LINE__ at the
point where it's invoked.

getc() is equivalent to fgetc() *except* that if it's implemented as a
macro, it can evaluate its stream argument more than once. It turns
out that this makes it much easier to implement it efficiently (take a
look at your own system's implementation in stdio.h), and the stream
argument would have side effects only in a very unusual program.

For example, if you had an array of FILE* and you wanted to cycle
through it, reading a single character from each, you might write:
c = fgetc(file_array[i++]);
If you use getc rather than fgetc, 'i' might be incremented more than
once. Even though this is a very unusual usage, a conforming
implementation *must* arrange for it to work properly. Thus getc()
has this special permission so it can be more efficent for the 99.99%
of cases where it doesn't matter that the stream argument may be
evaluated more than once.

Note that even if a getc() macro is provided (as it almost certain
is), there still has to be a getc() function as well.

Malcolm McLean · Aug 21, 2007

Kelsey Bjarnason said:
Hmm. Let's see.

Bug 1: if realloc fails, instead of returning whatever has been read so
far, you simply discard it. Apparently it's better to get _no_ data than
partial data.

Enlightenment dawns. Sure it is. No results beats wrong results.

Bug 2: 16-bit implementation, long lines. Let's see what happens:

buffsize starts at 128, adding 64 bytes each time. The last allocation
would have been some 32704 bytes which, combined with the previous 32640
is still viable in a 64K memory, so presumably one can allocate the space
without issue on such a system[1].

Except... buffsize is 32704. Add 64 to this, it becomes 32768 - which, as
I recall, overflows the range of a signed 16-bit int, which is only
required to store -32767 to 32767.

Hmm.

You use realloc.

And integer overflow.

Can said integer overflow result in a buffsize value of zero? Let's see
what happens with realloc on this here implementation, when the size is
zero. According to the docs...

void *realloc(void *ptr, size_t size);

if size is equal to zero, the call is equivalent to free(ptr).
...
If size was equal to 0, either NULL or a pointer suitable to be passed
to free() is returned.

So...

temp = realloc (buff, buffsize + 64);

You've just freed buff and (possibly) set temp to a pointer which is
suitable to be passed to free, but which has _absolutely no usability_
otherwise. Which you promptly try to stuff data into.

Boom for signed integer overflow, boom again for stuffing data into bogus
pointers. And smack at least once for discarding data because _you_ think
it just doesn't matter, instead of letting the user make such a call.

I have no idea whether Eric saw these or something else, or whether he'd
even classify these as bugs or not. They're just some obvious points of
what I'd consider questionable coding.

You're right. Converting to size_t won't necessarily solve the problem. You
can fix it with fancy coding, inappropriate for what the purpose is. That's
why I specify the functions can fail for extreme inputs. I specifically
reiterate that the fucntion is not acceptable as a library function.

Malcolm McLean · Aug 21, 2007

Richard Heathfield said:
There is no need to put *any* code in an algorithms book. But if you're
*going* to include the code, include the right code - code that works,
implements the algorithm properly, handles errors appropriately, and is
robust in the face of straitened resource circumstances.

No. That sort of code is too hard to read. What I am after is the idea of
specifying functions. That's the purpose of chapter one.
The idea is not to implement a library for production use.

And that's basically what he's trying to do, except that he's trying to
improve on fgets (by making it possible to read arbitrarily long lines
in a single call) - but failing, IMHO. His poor design and poor
implementation doom the attempt from the start.

The function is reasonable drop-in for fgets() in a non-security, non-safety
critical environment. It means the programmer doesn't have to worry about
buffer size. A malicious user can crash things by passing a massive line to
the function, but we don't all have to consider that possibility.

santosh · Aug 21, 2007

Malcolm said:
Richard Heathfield wrote:

The function is reasonable drop-in for fgets() in a non-security,
non-safety critical environment. It means the programmer doesn't have to
worry about buffer size. A malicious user can crash things by passing a
massive line to the function, but we don't all have to consider that
possibility.

A very simple check is all that would be required to avoid this DoS issue.
Surely one line is not going to confuse anyone?

What is potentially more confusing is the use of signed types when unsigned
ones are appropriate.

Malcolm McLean · Aug 21, 2007

Eric Sosman said:
I posted a retraction a few days later, but as a hint rather
than as an outright "I dun rong" message. I suspected that the
code's author was not reading the criticisms carefully, and the
fact that he didn't give me both barrels in the traditional c.l.c.
fashion -- not even after a hint -- seems to support my suspicion.

There was so much of it. I decided to wait until the thread had died down,
then list all the errors.

The error itself I blame on my own decreasing visual acuity,
exacerbated by the author's decision to use a small proportional
font for his code: the ! character is very narrow, and snuggled
concealingly into the embrace of the preceding ( in a way that
made it invisible to my aging eyes.

In a way a false bug report is a much more damning criticism than a real
one. A real bug is usually easily corrected. A false bug means that the code
isn' sufficiently clear, which is a lot harder to fix. However dynamically
expanding buffers are inherently hard in C. I cut the code down as much as
possible, and I don't think it can really be made any simpler. As it is
there are not really enough error checks - if the buffer overflows the range
of an int there will be problems.

Malcolm McLean · Aug 21, 2007

santosh said:
What is potentially more confusing is the use of signed types when
unsigned
ones are appropriate.

That's a design decision. Insisting on certain data types means that the
algorithms are harder to port to languages other than C, which may lack
unsigned types.

Al Balmer · Aug 21, 2007

No. That sort of code is too hard to read. What I am after is the idea of
specifying functions. That's the purpose of chapter one.
The idea is not to implement a library for production use.

So write pseudocode. Then there's no danger that your readers will
assume it's good, properly implemented C code.

This would also satisfy your goal (as stated to Santosh) of making it
easier to port.

Kelsey Bjarnason · Aug 21, 2007

[snips]

Enlightenment dawns. Sure it is. No results beats wrong results.

Not _wrong results_, incomplete data.

If I were trying to make a copy of the data off a flaky disk onto a good
one, for example, I'd much rather get as much as possible than have the
routine dictate to me what I'm allowed and not allowed to read. The data
*has been read*. It's not *your* place to tell me I can't use it, that's
my decision.

You're right. Converting to size_t won't necessarily solve the problem. You
can fix it with fancy coding, inappropriate for what the purpose is. That's
why I specify the functions can fail for extreme inputs. I specifically
reiterate that the fucntion is not acceptable as a library function.

Nor for any other purpose, other than to show people how *not* to write
software. Yes, we're agreed.

Kelsey Bjarnason · Aug 21, 2007

That's a design decision. Insisting on certain data types means that the
algorithms are harder to port to languages other than C, which may lack
unsigned types.

Writing in a sensible pseudocode makes it even easier to port, but you
didn't do that. Thus the only viable conclusion is that you weren't
concerned about ease of porting, you just can't write C code.

CBFalconer · Aug 21, 2007

Philip said:
Richard Heathfield wrote:
.... snip ...

Why is getc() an improvement over fgetc()? I've heard this stated
but never explained.

Because getc can be implemented as a macro, provided its action
doesn't have to evaluate the argument more than once. This can
avoid a good deal of procedure calling, and also avoid having to
assign an input buffer. There are special provisions for getc and
putc in the standard to allow this.

NUMERICAL RECIPES	66	Jan 2, 2011
The International Conference on Computing, Networking and DigitalTechnologies (ICCNDT2012)	0	Sep 7, 2012
CFP: GAMEON 2007, November 20-22, 2007, University of Bologna, Bologna,Italy	0	Jun 15, 2007
Senior Statistician Modeller/ SAS Modeler required in bellevue, WA	0	Jul 31, 2009
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Dec 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Nov 1, 2007

Malcolm's new book

Richard Heathfield

Richard Heathfield

pete

Eric Sosman

Philip Potter

pete

Philip Potter

Richard Heathfield

Philip Potter

Richard Heathfield

Keith Thompson

Malcolm McLean

Malcolm McLean

santosh

Malcolm McLean

Malcolm McLean

Al Balmer

Kelsey Bjarnason

Kelsey Bjarnason

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads