Don Knuth and the C language

Discussion in 'C Programming' started by jacob navia, Apr 30, 2014.

  1. jacob navia

    jacob navia Guest

    The way C handles pointers, for example, was a brilliant innovation; it
    solved a lot of problems that we had before in data structuring and made
    the programs look good afterwards. C isn't the perfect language, no
    language is, but I think it has a lot of virtues, and you can avoid the
    parts you don't like. I do like C as a language, especially because it
    blends in with the operating system (if you're using UNIX, for example).

    Don Knuth cited by
    jacob navia, Apr 30, 2014
    1. Advertisements

  2. Your point?
    Bill Cunningham, Apr 30, 2014
    1. Advertisements

  3. Bill Cunningham, Apr 30, 2014
  4. jacob navia

    Kaz Kylheku Guest

    Just days ago I quoted this in a comp.lang.misc thread.
    Kaz Kylheku, Apr 30, 2014
  5. Seems to me that the main feature of C and C pointers is that they
    are easy to implement in a small efficient compiler on small
    memory machines generating small efficient object code.

    One other language with pointers that might have been used before C
    was PL/I. Also, PL/I was the primary language for Multics, including
    the language that most of Multics was written in.

    As with Java, there is no PL/I feature similar to incrementing
    or decrementing a pointer. (A C feature that makes it especially
    easy for buffer overflow problems.)

    Strings, structures, and arrays are normally passed by descriptor
    in PL/I, such that the called routine has the appropriate bounds

    Partial qualification of structure references (did anyone ever
    consider adding that to C) makes it easy to work with complicated
    structures. (A little like this in Java, but more general.)

    Seems to me that if you have more memory available, and aren't
    in so much of a hurry, other pointer implementations aren't
    so bad.

    -- glen
    glen herrmannsfeldt, May 1, 2014
  6. jacob navia

    Ian Collins Guest

    Can you explain partial qualification of structure references?
    Ian Collins, May 1, 2014
  7. In Java, you can reference this variables without the this.
    qualifier, as long as there isn't a local variable with the
    same name.

    In PL/I, you can leave out structure qualifiers as long as the
    result isn't ambiguous.

    struct {
    int x;
    } y;

    x=3; // instead of y.x=3;

    Or leave out intermediate qualifiers.

    struct {
    struct {
    int c;
    } b;
    } a;
    double c;

    I don't think I ever wrote nested C structs that way before, but:

    DCL 1 A, 2 B, 3 C FIXED BIN(31,0);
    DCL C FLOAT BIN(53);


    Since there is a C variable, either B.C or A.C are partial
    qualifications for A.B.C. If no other C, then C would be enough.

    If a name is appropriately qualified for its nesting level, then
    names at outer nesting levels don't make it ambiguous.

    -- glen
    glen herrmannsfeldt, May 1, 2014
  8. jacob navia

    jacob navia Guest

    Le 01/05/2014 04:23, glen herrmannsfeldt a écrit :
    In the lcc-win C compiler you can leave out the path to a member of the
    structure, a similar but not identical improvement I have from the
    Plan/9 compiler.

    struct inner { int inner_member;};

    struct outer { struct inner I;};

    struct outer Outer;
    /* ... */
    Outer.inner_member - 23;

    instead of

    jacob navia, May 1, 2014
  9. C basically says "here's a memory buffer, now do what you want with it".
    Other languages tie you up with safety restrictions, so if an object is a
    string you can only interact with it using the string-handling functions
    provided by the language.
    The hand-holding is tolerable if you're writing fairly short, simple programs,
    but it becomes a real nuisance once programs get above a certain level of
    complexity. Because they need to interface with other modules which might
    be written using other conventions for types, because they need to call
    modules written in other language, because they need loosely-coupled, generic
    linking which is best achieved by passing function pointers and void *s.

    For instance, say we want to implement compression. In C, you write
    void *compress(void *data, int len, int *clen)
    and its just a case of flattening all the data before passing it to the routine.

    in other languages, this is often a lot harder to achieve. Sometimes there's a
    serialise method, and every single object needs to derive from "Serialisable" and
    implement the method correctly - huge scope for an error somewhere. Then it
    will write to a serial_buffer, so the compressor has to take a serial_buffer, so it's
    no longer an independent module.
    Malcolm McLean, May 1, 2014
  10. void * is not a "fudge". It's an inherent feature of the C language. We can pass
    around memory buffers, and either treat them as sequences of bytes, or
    pass them to client code which understands them.
    It's C's way of providing flexibility, loose coupling, and abstraction.

    Now I'm fully aware that a lot of people think that more strictly-typed languages
    are better as projects scale in complexity. I argued the opposite, and gave
    reasons. You're going to have to do better than reiterate the conventional
    wisdom you were maybe taught as an undergraduate if you're going to be
    treated as a poster who deserves any type of respect.

    Strict type-checking is hardly "very clever". It's common design feature of many
    languages. C is clever in that it provides type checking where the programmer
    is likely to to make a mistake (e.g. passing the address of a float to a function
    that expects a (double *), but doesn't enforce it where it's unlikely he's making
    a mistake (e.g. if he tries to access a double as a sequence of bytes, it's likely that
    he knows exactly what he's doing).
    Malcolm McLean, May 1, 2014
  11. Well, for one, you already assume that it can be processed as a
    sequence of 8 bit bytes. Theoretically, you check CHAR_BIT before
    doing that, but that it pretty rare.
    At the time that PL/I, and even C, were being developed, it wasn't
    so obvious that everyone would use 8 bit bytes. Multics was on
    a 36 bit machine, and other 36 bit machines were reasonably
    popular at the time, such as the IBM 7090 or 7094, and the DEC

    The PL/I UNSPEC function and pseudo-variable allow converting
    any type to/from a bit string.

    As someone mentioned compression, say I take the C source for
    the unix compress program and compile it on a PDP-10, so I can
    compress programs and send them to you. With 36 bit words, the
    value of CHAR_BIT might be 9 or 18, and assuming (not likely)
    that the person who wrote compress even checked CHAR_BIT, it
    isn't likely that you will be able to decompress and read
    the files that I send you.

    Seems to me some combination of IBM S/360, S/370 (for larger
    computer installations) and VAX (for smaller ones) popularized
    the 8 bit byte such that the C assumptions made sense.

    -- glen
    glen herrmannsfeldt, May 1, 2014
  12. I'm not sure what you mean by "the C assumptions". As you know,
    C very specifically does not assume that 8-bit bytes (though it
    does assume *at least* 8-bit bytes).

    At least some compression algorithms are defined in terms of 8-bit
    bytes (see RFCs 1951 and 1952, for example). That limitation isn't
    imposed by C. No doubt you could define a compression algorithm
    that operates in bit sequences, but then you'd need metadata to
    indicate how many bits of the last byte are part of the data.

    It's not clear (to me) how you'd even copy an 8-bit file to 36-bit
    system, let alone compress or decompress it, though I presume there
    are ways to do it.
    Keith Thompson, May 1, 2014
  13. I gave the prototype of the function as

    void *compress(void *data, int len, int *clen);

    Now lets' suppose that's all the documentation you have, except that you
    also have a matching decompressor.
    The two possibilities are that compress treats data as a bitstream, or as a
    sequence of 8 bit bytes. The compression buffer will have the matching
    assumption. So all you need to do on a system with bytes greater than
    8 bit is

    unsigned char test[4] = {0,0x100, 1,0xFF};
    unsigned char out[40];
    void *comp;
    int clen, olen;

    cop = compress(test, 4, &clen);
    decompress(out, comp, clen, &olen);

    printf('%x %x %x %x\n", out[0], out[1], out[2], out[3]);
    printf("%d compressed 5d %d decompressed\n", clen, olen);

    That little bit of exploratory programming will tell you how bytes
    greater than 255 are handled, and so how to call the function.

    It's easy, it's flexible, it's portable, even if the programmer hasn't
    quite foreseen everything.

    By contrast if compress is written in another language, on top
    of s serial buffer, then how is that serial buffer going to behave?
    How are the two going to interact?
    Malcolm McLean, May 1, 2014
  14. (snip, I wrote)
    Well, someone previously mentioned that C allows one to consider
    a double as bytes. And yes you can check CHAR_BIT before doing
    that, but most people don't.
    Well, say you are on a 36 bit system and CHAR_BIT is 9.
    Assuming you check it, you can write a compression algorithm that
    will read and write sequences of 9 bit bytes. The C library, then,
    has to read 36 bit words, pass them in as 9 bit bytes, take the 9
    bit bytes that come out, and reassemble them as 36 bit words before
    writing to disk.

    LZW (compress or gzip) normally writes out groups of bits between 9
    and 16 bits long. (Lots of shift, and, and or operations.)
    It probably isn't hard to use CHAR_BIT to change that to compress
    an input of 9 bit bytes.
    That is where it gets interesting. I think Kermit knows how to do it.
    DEC systems store ASCII text as five 7 bit characters per word, with
    one bit left over. (Some editors would use that bit to indicate
    that a line had a line number in it.) In the case of ASCII text
    files, you have to consider that when transferring to/from 8 bit

    A compressed stream of 9 bit bytes won't work at all when it gets
    to an 8 bit system. The CHAR_BIT minimum of 8 makes no allowance
    for a system where text files are a sequence of 7 bit bytes in 36
    bit words.

    While in theory C allows for other than 8 bit bytes, it is too
    easy for programmers to assume it, and that, in turn, may have helped
    the popularity of 8 bit systems.

    -- glen
    glen herrmannsfeldt, May 1, 2014
  15. jacob navia

    Stefan Ram Guest

    In C one actually only gets the start address, but has to
    learn the size of the buffer by other means. (The size of
    the pointee which is provided by the type system of C can
    only be employed for static buffers.)
    Stefan Ram, May 1, 2014
  16. So the compressed data goes into the same buffer as the uncompressed
    data? If compression makes it bigger, where does the extra data go?
    But that's not directly relevant to the current point, so let's ignore
    it for now. (Of course I'd use size_t rather than int, but we can set
    that aside as well.)
    It's unlikely, and frankly unacceptable, that that would be the only
    documentation. The algorithm used needn't be documented, but there had
    better be something that tells me how to use it.
    If it operates on bitstreams (which needn't be a whole number of octets
    or of bytes), how does it tell you how many bits of the final octet or
    byte are part of the bitstream?
    A function with a built-in assumption of CHAR_BIT==8 could exhibit a
    wide variety of behaviors given bytes greater than 255; I'm not
    convinced that examining the output of a single call would be that
    I have no idea.
    Keith Thompson, May 1, 2014
  17. Yes, C allows you to consider any object as an array of unsigned
    char, i.e., of bytes; that's how the standard defines "object

    I still don't understand what you mean by "the C assumption".
    Certainly a lot of C programmers write code that assumes CHAR_BIT==8
    (and depending on the context, that can be a perfectly reasonable
    assumption). The language can't *force* programmers to write code
    that's portable to systems with CHAR_BIT > 8. It provides asll
    the tools to do so, but interoperability between, say, 8-bit and
    9-bit systems is trickier.

    So is that what you mean by "the C assumption", that C *programmers*
    make an assumption that isn't imposed by the C standard? If so, that's
    a perfectly valid point, but I wouldn't use that phrase to describe it.
    Keith Thompson, May 1, 2014
  18. jacob navia

    James Kuyper Guest

    I'm not sure precisely what you mean by that statement. sizeof(*p) can
    only give you the size of the type that p points at; C has no construct
    that tells you the largest integer n such that p[n] can be safely
    evaluated. It's up to the developer to keep track of that information,
    and to choose the methods by which that information is passed around..
    That's equally true whether or not static allocation is used.
    James Kuyper, May 1, 2014
  19. "sizeof *ptr" is useful only if ptr points to a single object and
    you're not interested in treating that object as the first element
    of an array. One special case of this is when ptr is a pointer to
    an array, something that's not used very often in practice.

    I think that by "static" you probably mean something whose size
    is determined at compile time; that's not what the word "static"
    means in C.

    Note that you can define a pointer to a VLA type, which opens up
    some interesting possibilities:

    #include <stdio.h>
    #include <stdlib.h>
    int main(void) {
    size_t width = 100;
    size_t height = 100;
    typedef double matrix[width][height];
    matrix *ptr = malloc(sizeof *ptr);
    if (ptr == NULL) {
    fprintf(stderr, "malloc failed\n");
    printf("sizeof *ptr = %zu\n", sizeof *ptr);
    (*ptr)[50][50] = 1.25;
    printf("(*ptr)[50][50] = %g\n", (*ptr)[50][50]);

    (It's unfortunate, IMHO, that C11 made VLAs optional.)

    This takes advantage of VLAs *without* the risk of undefined behavior
    that you can get from allocationg a VLA "on the stack".
    Keith Thompson, May 1, 2014
  20. [...]

    I've seen the word "static" used with that meaning (the Ada standard
    uses it that way, for example), but it has a specific meaning in C,
    and that's not it. I think you mean "constant".
    Keith Thompson, May 1, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.