Don Knuth and the C language

jacob navia · Apr 30, 2014

The way C handles pointers, for example, was a brilliant innovation; it
solved a lot of problems that we had before in data structuring and made
the programs look good afterwards. C isn't the perfect language, no
language is, but I think it has a lot of virtues, and you can avoid the
parts you don't like. I do like C as a language, especially because it
blends in with the operating system (if you're using UNIX, for example).

Don Knuth cited by
http://www.softpanorama.org/Lang/c.shtml

Bill Cunningham · Apr 30, 2014

jacob navia said:
The way C handles pointers, for example, was a brilliant innovation; it
solved a lot of problems that we had before in data structuring and made
the programs look good afterwards. C isn't the perfect language, no
language is, but I think it has a lot of virtues, and you can avoid the
parts you don't like. I do like C as a language, especially because it
blends in with the operating system (if you're using UNIX, for example).

Don Knuth cited by
http://www.softpanorama.org/Lang/c.shtml

Your point?

Bill Cunningham · Apr 30, 2014

When you understand what a pointer is Bill then you can comment

1/10

<hands up> My apoligies!
http://en.wikipedia.org/wiki/Genetic_fallacy

Kaz Kylheku · Apr 30, 2014

The way C handles pointers, for example, was a brilliant innovation; it
solved a lot of problems that we had before in data structuring and made

Just days ago I quoted this in a comp.lang.misc thread.

glen herrmannsfeldt · May 1, 2014

Just days ago I quoted this in a comp.lang.misc thread.

Seems to me that the main feature of C and C pointers is that they
are easy to implement in a small efficient compiler on small
memory machines generating small efficient object code.

One other language with pointers that might have been used before C
was PL/I. Also, PL/I was the primary language for Multics, including
the language that most of Multics was written in.

As with Java, there is no PL/I feature similar to incrementing
or decrementing a pointer. (A C feature that makes it especially
easy for buffer overflow problems.)

Strings, structures, and arrays are normally passed by descriptor
in PL/I, such that the called routine has the appropriate bounds
information.

Partial qualification of structure references (did anyone ever
consider adding that to C) makes it easy to work with complicated
structures. (A little like this in Java, but more general.)

Seems to me that if you have more memory available, and aren't
in so much of a hurry, other pointer implementations aren't
so bad.

-- glen

Ian Collins · May 1, 2014

glen said:
Partial qualification of structure references (did anyone ever
consider adding that to C) makes it easy to work with complicated
structures. (A little like this in Java, but more general.)

Can you explain partial qualification of structure references?

glen herrmannsfeldt · May 1, 2014

Ian Collins said:
glen herrmannsfeldt wrote:

Can you explain partial qualification of structure references?

In Java, you can reference this variables without the this.
qualifier, as long as there isn't a local variable with the
same name.

In PL/I, you can leave out structure qualifiers as long as the
result isn't ambiguous.

struct {
int x;
} y;

x=3; // instead of y.x=3;

Or leave out intermediate qualifiers.

struct {
struct {
int c;
} b;
} a;
double c;
c=1.3;
a.b.c=4;

I don't think I ever wrote nested C structs that way before, but:

DCL 1 A, 2 B, 3 C FIXED BIN(31,0);
DCL C FLOAT BIN(53);

C=1.3;
A.C=4;

Since there is a C variable, either B.C or A.C are partial
qualifications for A.B.C. If no other C, then C would be enough.

If a name is appropriately qualified for its nesting level, then
names at outer nesting levels don't make it ambiguous.

-- glen

jacob navia · May 1, 2014

Le 01/05/2014 04:23, glen herrmannsfeldt a écrit :

In Java, you can reference this variables without the this.
qualifier, as long as there isn't a local variable with the
same name.

In PL/I, you can leave out structure qualifiers as long as the
result isn't ambiguous.

struct {
int x;
} y;

x=3; // instead of y.x=3;

Or leave out intermediate qualifiers.

struct {
struct {
int c;
} b;
} a;
double c;
c=1.3;
a.b.c=4;

I don't think I ever wrote nested C structs that way before, but:

DCL 1 A, 2 B, 3 C FIXED BIN(31,0);
DCL C FLOAT BIN(53);

C=1.3;
A.C=4;

Since there is a C variable, either B.C or A.C are partial
qualifications for A.B.C. If no other C, then C would be enough.

If a name is appropriately qualified for its nesting level, then
names at outer nesting levels don't make it ambiguous.

-- glen

In the lcc-win C compiler you can leave out the path to a member of the
structure, a similar but not identical improvement I have from the
Plan/9 compiler.

struct inner { int inner_member;};

struct outer { struct inner I;};

struct outer Outer;
/* ... */
Outer.inner_member - 23;

instead of

Outer.I.inner_member;

Malcolm McLean · May 1, 2014

Seems to me that the main feature of C and C pointers is that they
are easy to implement in a small efficient compiler on small
memory machines generating small efficient object code.

One other language with pointers that might have been used before C
was PL/I. Also, PL/I was the primary language for Multics, including
the language that most of Multics was written in.

As with Java, there is no PL/I feature similar to incrementing
or decrementing a pointer. (A C feature that makes it especially
easy for buffer overflow problems.)

Strings, structures, and arrays are normally passed by descriptor
in PL/I, such that the called routine has the appropriate bounds
information.

Partial qualification of structure references (did anyone ever
consider adding that to C) makes it easy to work with complicated
structures. (A little like this in Java, but more general.)

Seems to me that if you have more memory available, and aren't
in so much of a hurry, other pointer implementations aren't
so bad.

C basically says "here's a memory buffer, now do what you want with it".
Other languages tie you up with safety restrictions, so if an object is a
string you can only interact with it using the string-handling functions
provided by the language.
The hand-holding is tolerable if you're writing fairly short, simple programs,
but it becomes a real nuisance once programs get above a certain level of
complexity. Because they need to interface with other modules which might
be written using other conventions for types, because they need to call
modules written in other language, because they need loosely-coupled, generic
linking which is best achieved by passing function pointers and void *s.

For instance, say we want to implement compression. In C, you write
void *compress(void *data, int len, int *clen)
and its just a case of flattening all the data before passing it to the routine.

in other languages, this is often a lot harder to achieve. Sometimes there's a
serialise method, and every single object needs to derive from "Serialisable" and
implement the method correctly - huge scope for an error somewhere. Then it
will write to a serial_buffer, so the compressor has to take a serial_buffer, so it's
no longer an independent module.

Malcolm McLean · May 1, 2014

You have it arseways. This "hand holding" ensures compile time type
safety. If you havent designed your type hierarachy and need a fudge
like "void *" to access your data at the higher level then you've fucked
up big time.

It's not "hand holding" : it's very clever enforcement of the design and
the representative data types.

void * is not a "fudge". It's an inherent feature of the C language. We can pass
around memory buffers, and either treat them as sequences of bytes, or
pass them to client code which understands them.
It's C's way of providing flexibility, loose coupling, and abstraction.

Now I'm fully aware that a lot of people think that more strictly-typed languages
are better as projects scale in complexity. I argued the opposite, and gave
reasons. You're going to have to do better than reiterate the conventional
wisdom you were maybe taught as an undergraduate if you're going to be
treated as a poster who deserves any type of respect.

Strict type-checking is hardly "very clever". It's common design feature of many
languages. C is clever in that it provides type checking where the programmer
is likely to to make a mistake (e.g. passing the address of a float to a function
that expects a (double *), but doesn't enforce it where it's unlikely he's making
a mistake (e.g. if he tries to access a double as a sequence of bytes, it's likely that
he knows exactly what he's doing).

glen herrmannsfeldt · May 1, 2014

Malcolm McLean said:
On Thursday, May 1, 2014 12:52:08 PM UTC+1, Richard wrote:
(snip)

void * is not a "fudge". It's an inherent feature of the C language.
We can pass around memory buffers, and either treat them as
sequences of bytes, or pass them to client code which understands them.
It's C's way of providing flexibility, loose coupling, and abstraction.

Well, for one, you already assume that it can be processed as a
sequence of 8 bit bytes. Theoretically, you check CHAR_BIT before
doing that, but that it pretty rare.

Now I'm fully aware that a lot of people think that more
strictly-typed languages are better as projects scale in complexity.
I argued the opposite, and gave reasons. You're going to have to
do better than reiterate the conventional wisdom you were maybe
taught as an undergraduate if you're going to be treated as a
poster who deserves any type of respect.

Strict type-checking is hardly "very clever". It's common design
feature of many languages. C is clever in that it provides
type checking where the programmer is likely to to make a
mistake (e.g. passing the address of a float to a function that
expects a (double *), but doesn't enforce it where it's
unlikely he's making a mistake (e.g. if he tries to access a
double as a sequence of bytes, it's likely that he knows
exactly what he's doing).

At the time that PL/I, and even C, were being developed, it wasn't
so obvious that everyone would use 8 bit bytes. Multics was on
a 36 bit machine, and other 36 bit machines were reasonably
popular at the time, such as the IBM 7090 or 7094, and the DEC
PDP-10.

The PL/I UNSPEC function and pseudo-variable allow converting
any type to/from a bit string.

As someone mentioned compression, say I take the C source for
the unix compress program and compile it on a PDP-10, so I can
compress programs and send them to you. With 36 bit words, the
value of CHAR_BIT might be 9 or 18, and assuming (not likely)
that the person who wrote compress even checked CHAR_BIT, it
isn't likely that you will be able to decompress and read
the files that I send you.

Seems to me some combination of IBM S/360, S/370 (for larger
computer installations) and VAX (for smaller ones) popularized
the 8 bit byte such that the C assumptions made sense.

-- glen

Keith Thompson · May 1, 2014

glen herrmannsfeldt said:
At the time that PL/I, and even C, were being developed, it wasn't
so obvious that everyone would use 8 bit bytes. Multics was on
a 36 bit machine, and other 36 bit machines were reasonably
popular at the time, such as the IBM 7090 or 7094, and the DEC
PDP-10.

The PL/I UNSPEC function and pseudo-variable allow converting
any type to/from a bit string.

As someone mentioned compression, say I take the C source for
the unix compress program and compile it on a PDP-10, so I can
compress programs and send them to you. With 36 bit words, the
value of CHAR_BIT might be 9 or 18, and assuming (not likely)
that the person who wrote compress even checked CHAR_BIT, it
isn't likely that you will be able to decompress and read
the files that I send you.

Seems to me some combination of IBM S/360, S/370 (for larger
computer installations) and VAX (for smaller ones) popularized
the 8 bit byte such that the C assumptions made sense.

I'm not sure what you mean by "the C assumptions". As you know,
C very specifically does not assume that 8-bit bytes (though it
does assume *at least* 8-bit bytes).

At least some compression algorithms are defined in terms of 8-bit
bytes (see RFCs 1951 and 1952, for example). That limitation isn't
imposed by C. No doubt you could define a compression algorithm
that operates in bit sequences, but then you'd need metadata to
indicate how many bits of the last byte are part of the data.

It's not clear (to me) how you'd even copy an 8-bit file to 36-bit
system, let alone compress or decompress it, though I presume there
are ways to do it.

Malcolm McLean · May 1, 2014

I'm not sure what you mean by "the C assumptions". As you know,
C very specifically does not assume that 8-bit bytes (though it
does assume *at least* 8-bit bytes).

At least some compression algorithms are defined in terms of 8-bit
bytes (see RFCs 1951 and 1952, for example). That limitation isn't
imposed by C. No doubt you could define a compression algorithm
that operates in bit sequences, but then you'd need metadata to
indicate how many bits of the last byte are part of the data.

I gave the prototype of the function as

void *compress(void *data, int len, int *clen);

Now lets' suppose that's all the documentation you have, except that you
also have a matching decompressor.
The two possibilities are that compress treats data as a bitstream, or as a
sequence of 8 bit bytes. The compression buffer will have the matching
assumption. So all you need to do on a system with bytes greater than
8 bit is

unsigned char test[4] = {0,0x100, 1,0xFF};
unsigned char out[40];
void *comp;
int clen, olen;

cop = compress(test, 4, &clen);
decompress(out, comp, clen, &olen);

printf('%x %x %x %x\n", out[0], out[1], out[2], out[3]);
printf("%d compressed 5d %d decompressed\n", clen, olen);

That little bit of exploratory programming will tell you how bytes
greater than 255 are handled, and so how to call the function.

It's easy, it's flexible, it's portable, even if the programmer hasn't
quite foreseen everything.

By contrast if compress is written in another language, on top
of s serial buffer, then how is that serial buffer going to behave?
How are the two going to interact?

glen herrmannsfeldt · May 1, 2014

(snip, I wrote)

I'm not sure what you mean by "the C assumptions". As you know,
C very specifically does not assume that 8-bit bytes (though it
does assume *at least* 8-bit bytes).

Well, someone previously mentioned that C allows one to consider
a double as bytes. And yes you can check CHAR_BIT before doing
that, but most people don't.

At least some compression algorithms are defined in terms of 8-bit
bytes (see RFCs 1951 and 1952, for example). That limitation isn't
imposed by C. No doubt you could define a compression algorithm
that operates in bit sequences, but then you'd need metadata to
indicate how many bits of the last byte are part of the data.

Well, say you are on a 36 bit system and CHAR_BIT is 9.
Assuming you check it, you can write a compression algorithm that
will read and write sequences of 9 bit bytes. The C library, then,
has to read 36 bit words, pass them in as 9 bit bytes, take the 9
bit bytes that come out, and reassemble them as 36 bit words before
writing to disk.

LZW (compress or gzip) normally writes out groups of bits between 9
and 16 bits long. (Lots of shift, and, and or operations.)
It probably isn't hard to use CHAR_BIT to change that to compress
an input of 9 bit bytes.

It's not clear (to me) how you'd even copy an 8-bit file to 36-bit
system, let alone compress or decompress it, though I presume there
are ways to do it.

That is where it gets interesting. I think Kermit knows how to do it.
DEC systems store ASCII text as five 7 bit characters per word, with
one bit left over. (Some editors would use that bit to indicate
that a line had a line number in it.) In the case of ASCII text
files, you have to consider that when transferring to/from 8 bit
systems.

A compressed stream of 9 bit bytes won't work at all when it gets
to an 8 bit system. The CHAR_BIT minimum of 8 makes no allowance
for a system where text files are a sequence of 7 bit bytes in 36
bit words.

While in theory C allows for other than 8 bit bytes, it is too
easy for programmers to assume it, and that, in turn, may have helped
the popularity of 8 bit systems.

-- glen

Stefan Ram · May 1, 2014

Malcolm McLean said:
C basically says "here's a memory buffer, now do what you want with it".

In C one actually only gets the start address, but has to
learn the size of the buffer by other means. (The size of
the pointee which is provided by the type system of C can
only be employed for static buffers.)

Keith Thompson · May 1, 2014

Malcolm McLean said:
I gave the prototype of the function as

void *compress(void *data, int len, int *clen);

So the compressed data goes into the same buffer as the uncompressed
data? If compression makes it bigger, where does the extra data go?
But that's not directly relevant to the current point, so let's ignore
it for now. (Of course I'd use size_t rather than int, but we can set
that aside as well.)

Now lets' suppose that's all the documentation you have, except that you
also have a matching decompressor.

It's unlikely, and frankly unacceptable, that that would be the only
documentation. The algorithm used needn't be documented, but there had
better be something that tells me how to use it.

The two possibilities are that compress treats data as a bitstream, or as a
sequence of 8 bit bytes. The compression buffer will have the matching
assumption.

If it operates on bitstreams (which needn't be a whole number of octets
or of bytes), how does it tell you how many bits of the final octet or
byte are part of the bitstream?

So all you need to do on a system with bytes greater than
8 bit is

unsigned char test[4] = {0,0x100, 1,0xFF};
unsigned char out[40];
void *comp;
int clen, olen;

cop = compress(test, 4, &clen);
decompress(out, comp, clen, &olen);

printf('%x %x %x %x\n", out[0], out[1], out[2], out[3]);
printf("%d compressed 5d %d decompressed\n", clen, olen);

That little bit of exploratory programming will tell you how bytes
greater than 255 are handled, and so how to call the function.

A function with a built-in assumption of CHAR_BIT==8 could exhibit a
wide variety of behaviors given bytes greater than 255; I'm not
convinced that examining the output of a single call would be that
useful.

It's easy, it's flexible, it's portable, even if the programmer hasn't
quite foreseen everything.

By contrast if compress is written in another language, on top
of s serial buffer, then how is that serial buffer going to behave?
How are the two going to interact?

I have no idea.

Keith Thompson · May 1, 2014

glen herrmannsfeldt said:
Well, someone previously mentioned that C allows one to consider
a double as bytes. And yes you can check CHAR_BIT before doing
that, but most people don't.

Yes, C allows you to consider any object as an array of unsigned
char, i.e., of bytes; that's how the standard defines "object
representation".

I still don't understand what you mean by "the C assumption".
Certainly a lot of C programmers write code that assumes CHAR_BIT==8
(and depending on the context, that can be a perfectly reasonable
assumption). The language can't *force* programmers to write code
that's portable to systems with CHAR_BIT > 8. It provides asll
the tools to do so, but interoperability between, say, 8-bit and
9-bit systems is trickier.

[...]

While in theory C allows for other than 8 bit bytes, it is too
easy for programmers to assume it, and that, in turn, may have helped
the popularity of 8 bit systems.

So is that what you mean by "the C assumption", that C *programmers*
make an assumption that isn't imposed by the C standard? If so, that's
a perfectly valid point, but I wouldn't use that phrase to describe it.

James Kuyper · May 1, 2014

In C one actually only gets the start address, but has to
learn the size of the buffer by other means. (The size of
the pointee which is provided by the type system of C can
only be employed for static buffers.)

I'm not sure precisely what you mean by that statement. sizeof(*p) can
only give you the size of the type that p points at; C has no construct
that tells you the largest integer n such that p[n] can be safely
evaluated. It's up to the developer to keep track of that information,
and to choose the methods by which that information is passed around..
That's equally true whether or not static allocation is used.

Keith Thompson · May 1, 2014

In C one actually only gets the start address, but has to
learn the size of the buffer by other means. (The size of
the pointee which is provided by the type system of C can
only be employed for static buffers.)

"sizeof *ptr" is useful only if ptr points to a single object and
you're not interested in treating that object as the first element
of an array. One special case of this is when ptr is a pointer to
an array, something that's not used very often in practice.

I think that by "static" you probably mean something whose size
is determined at compile time; that's not what the word "static"
means in C.

Note that you can define a pointer to a VLA type, which opens up
some interesting possibilities:

#include <stdio.h>
#include <stdlib.h>
int main(void) {
size_t width = 100;
size_t height = 100;
typedef double matrix[width][height];
matrix *ptr = malloc(sizeof *ptr);
if (ptr == NULL) {
fprintf(stderr, "malloc failed\n");
exit(EXIT_FAILURE);
}
printf("sizeof *ptr = %zu\n", sizeof *ptr);
fflush(stdout);
(*ptr)[50][50] = 1.25;
printf("(*ptr)[50][50] = %g\n", (*ptr)[50][50]);
}

(It's unfortunate, IMHO, that C11 made VLAs optional.)

This takes advantage of VLAs *without* the risk of undefined behavior
that you can get from allocationg a VLA "on the stack".

Keith Thompson · May 1, 2014

The general rule is that a T* points to an object of size
sizeof( T ).

Â»StaticÂ« means that the size of the buffer is known at
compile time.

[...]

I've seen the word "static" used with that meaning (the Ada standard
uses it that way, for example), but it has a specific meaning in C,
and that's not it. I think you mean "constant".

C language now truly universal	0	Jan 1, 2011
What is the different between while(0) and while(1) in c language	4	Jul 12, 2016
C as a scripting language	88	Mar 26, 2009
Errata for The C Programming Language, Second Edition, by Brian Kernighanand Dennis Ritchie	4	May 16, 2009
Programming "only" in an environment of C and it's friends - Being a toolsmith	51	May 27, 2012
Maybe C is the perfect language for really good systems programmers, but unfortunately not-so-good s	18	Nov 4, 2009
Comparision of C Sharp and C performance	360	Dec 27, 2009
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012

Don Knuth and the C language

jacob navia

Bill Cunningham

Bill Cunningham

Kaz Kylheku

glen herrmannsfeldt

Ian Collins

glen herrmannsfeldt

jacob navia

Malcolm McLean

Malcolm McLean

glen herrmannsfeldt

Keith Thompson

Malcolm McLean

glen herrmannsfeldt

Stefan Ram

Keith Thompson

Keith Thompson

James Kuyper

Keith Thompson

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads