Don Knuth and the C language

Discussion in 'C Programming' started by jacob navia, Apr 30, 2014.

  1. jacob navia

    jacob navia Guest

    Le 03/05/2014 21:47, Keith Thompson a écrit :
    Of course modern windows/MSDOS allows you to emit an EOF when using the

    Unix uses ctrl-D for this purpose. When typing at the keyboard under
    Unix you hit ctrl-d to generate an EOF. The Ctrl-z is the same in Windows.
    jacob navia, May 3, 2014
    1. Advertisements

  2. To clarify, I'm talking about reading from a text file with a Control-Z
    character in the middle of it. A simple C program reading from such a
    file via stdin does not read anything after the Control-Z.
    Keith Thompson, May 3, 2014
    1. Advertisements

  3. It would be better if it allowed you to close the input. Emitting and
    EOF means there must be something *in* the stream that marks the end,
    and that's the very problem being discussed.
    No, they are very different. A Unix terminal often has some character
    (you can set it) that closes the stream. You can nonimate which
    character to use, you can turn the mechanism off if you want, and you
    can use a quote character to send the nominated character to the stream.
    Whatever the nominated character is, it never gets into to the stream.
    The C IO library never sees it (in fact, Unix read operations won't see
    it either). In particular, there is no special meaning of Ctrl-D in and
    input stream or file.
    Ben Bacarisse, May 3, 2014
  4. Control-D, or 4, is EOT, "end of transmission". Control-C is "end of text".
    Malcolm McLean, May 3, 2014
  5. jacob navia

    Lew Pitcher Guest

    In ASCII and derivative charactersets, to be sure.
    But, irrelevant to Ben's point.

    a) In Unix, the keystroke that signals to the I/O system that the terminal
    input device is at End-of-file is configurable; by convention, Unix users
    set this value to ^D (ASCII EOT), but the input system does not restrict
    the value to just ^D. Just as easily, the end user (or his programmatic
    proxy) can set this "end-of-file" character to ^Z or ^X or even ^N.

    b) In Unix, End-of-file is a *condition*, not a datum. Terminal devices are
    treated special by the underlying OS, in that the OS looks for a specific
    input datum (such as ^D) in order to trigger the condition. The input datum
    is discarded, and the OS reports "End-Of-File" to any programmatic input
    requests. This differs from what CP/M (and versions of MSDOS) did: they
    actually imbedded a character datum in the data stream, and left it to the
    input program to interpret that datum. ^Z to a CP/M text program was a byte
    of 0x1A in the reading program's input stream, and it was up to the reading
    program to interpret that value as "Oh, I've hit the end of valid data.
    There may be more to read, but I really shouldn't."
    Lew Pitcher, May 4, 2014
  6. I think you missed the point. Jacob might think the Ctrl-D (or some
    other control character) has a particular meaning for Unix but it does
    not. What ASCII chooses to call it is neither here nor there.
    Ben Bacarisse, May 4, 2014
  7. I don't think the behavior of a shell is relevant.

    On Unix, if a program is reading from a keyboard, typing Ctrl-D usually
    triggers an end-of-file condition, regardless of whether the program was
    invoked from a shell or not. The shell is just another program.

    And, as already noted, an equivalent character in a disk file is just
    another cahracter; the special treatment of Ctrl-D applys only when
    reading from a terminal device (a "tty" in Unix parlance).

    "cat" is not, as far as I know, a standard Windows program. I have it
    on my system, but only as part of add-on POSIX support packages (Cygwin
    and GOW), and it follows POSIX semantics.

    A C program on Windows, reading from a disk file in text mode, will
    trigger an end-of-file condition when it encounters a Ctrl-Z character.
    I don't believe this has anything to do with the shell.

    A sample program to test this behavior:

    #include <stdio.h>
    #include <assert.h>

    int main(void) {
    FILE *f;
    int result;
    const char *const filename = "tmp.txt";
    int saw_A = 0;
    int saw_Ctrl_Z = 0;
    int saw_Z = 0;
    int c;

    f = fopen(filename, "w");
    assert(f != NULL);
    fprintf(f, "A\n");
    fprintf(f, "%c\n", 26); /* Ctrl-Z */
    fprintf(f, "Z\n");
    result = fclose(f);
    assert(result == 0);

    f = fopen(filename, "r");
    assert(f != NULL);
    while ((c = fgetc(f)) != EOF) {
    switch (c) {
    case 'A':
    saw_A = 1;
    case 'Z':
    saw_Z = 1;
    case 26:
    saw_Ctrl_Z = 1;
    result = fclose(f);
    assert(result == 0);


    printf("saw_A = %d\n", saw_A);
    printf("saw_Z = %d\n", saw_Z);
    printf("saw_Ctrl_Z = %d\n", saw_Ctrl_Z);
    return 0;

    On Windows, compiled with MSVC 2010 Express, the output is:

    saw_A = 1
    saw_Z = 0
    saw_Ctrl_Z = 0
    Keith Thompson, May 4, 2014
  8. It is one that you don't think about so often, as long as it is
    working, but, yes, I believe it is the tty (real or virtual)
    device driver that does it.

    The stty or tset command change ioctl bits on the appropriate device.

    Some other tty device characteristics also seem like they would belong
    to the shell, such as my old favorite tostop, related to output from
    background jobs.

    -- glen
    glen herrmannsfeldt, May 4, 2014
  9. jacob navia

    jacob navia Guest

    Le 04/05/2014 02:11, Ben Bacarisse a écrit :
    Ctrl-D (by default terminal settings) means:

    Discard the ctrl D and set the input file as in EOF condition.

    The same as in windows when using the keyboard.

    The difference is that under UNiX that is configurable and not under

    The Ctrl-Z is an EOF character when opening the file in TEXT mode.

    I build a text file with embedded control-z characters in my mac and
    copied it to my windows machine. Some editors would not read beyond the
    ctrl-z because they opened the file in text mode.

    Wedit, the editor of lcc-win will read the whle file ignoring the ctrl-z
    directive and interpreting it as the character 26. Why?

    Because Wedit opens the file in BINARY mode.
    jacob navia, May 4, 2014
  10. Yes, no one disputes that -- it can mean something to the ttyt driver.
    It has no meaning in input streams or in files.
    No. It's different in a very significant way.

    I am sure you know exactly how it is different but you want to suggest
    otherwise for some bizarre reason.
    And that it is not part of the input, so it can not have any meaning in
    files (or any inpuput stream).
    Yes, on Windows. It's an actual character embedded in the input that
    has an effect on the IO layer of programs reading the data as text.
    Quite unlike Ctrl-D in Unix.
    Yes, of course. It would be staggering if it had the same meaning in
    binary mode -- Windows would be unusable if that were the case.
    Ben Bacarisse, May 4, 2014
  11. No, the shell in not relevant here, neither in Unix not in Windows.
    Yes. This is the key issue. An actual ASCII character, embedded in an
    input stream, is taken to mark the end of that stream by some IO
    functions. Obviously not in all cases -- Windows can open and read
    arbitrary data or it would be quite useless -- but it has stuck with
    honouring this inherited usage for text streams.

    You say it is the C run-time library that is responsible, and you may
    well be right. I certainly thought that was the case, but I don't know
    the details well enough to say so with any real confidence. And of
    course the exact consequences depend on what things use the C run-time.
    I'm not sure why you took my previous remarks to mean that I did not get
    it. Did it sound like I was suggesting this it was wired into Windows
    at some deeper level the C run-time? If so, you are right. It may have
    been once, but I think modern Windows native file IO routines ignore
    Ctrl-Z (I am not expert on Windows).
    Sure. That should have been made clear. It's the behaviour of some IO
    libraries (the C one is the topical one here) under Windows that is the
    I can't see how or why the shell has anything to do with it.
    Ben Bacarisse, May 4, 2014
  12. Only if there are no characters waiting to be passed to the process.
    Otherwise it is discarded and any waiting characters are sent. Typing
    "abc^D" does not result in an EOF condition.

    What's more, the "EOF condition" only exists at the stdio level.
    The underlying read() system call merely returns 0, and further
    reads may return more data. And some stdio implementations (in
    particular Linux) do not correctly implement the EOF condition.

    -- Richard
    Richard Tobin, May 4, 2014
  13. No but, often, it flushes the (pseudo-) tty input so the program gets
    it, and a second one the causes the input to be closed. The rule is, I
    think, that Ctrl-D (or whatever) closes the input only if there is no
    pending input. Making it "push" any such input is an obvious extension
    (but I said "often" because I don't know how widespread this behaviour

    Ben Bacarisse, May 4, 2014
  14. Agreed, since EOF is the name of a macro defined in <stdio.h>, whose
    value is not a character value.

    On the other hand, "EOF" can also be used simply as an abbreviation for
    the phrase "End Of File", so it's common (and not entirely incorrect) to
    refer to Ctrl-D or Ctrl-Z as an "EOF character". Such usage can be
    particularly confusing in the context of C, as in this newsgroup.

    The Linux documentation for "stty" wisely refers to "eof", not "EOF".
    Ok ... N1570 7.21.1p3:

    The macros are
    which expands to an integer constant expression, with type int and a
    negative value, that is returned by several functions to indicate
    end-of-file, that is, no more input from a stream;

    I've never seen an implementation where EOF has a value other than -1,
    and there are good reasons to use that specific value. For example, the
    is*() and to*() functions in <ctype.h> accept either a value in the
    range 0..UCHAR_MAX *or* the value of EOF; giving EOF a value adjacent to
    that range makes the implementation slightly more straightforward.
    Right. An end-of-file condition can be triggered either when a
    particular character appears in the input (Ctrl-D or whatever it's
    configured to when reading from a tty on Unix-like systems, Ctrl-Z from
    the keyboard or in a file on Windows), *or* when there's no more input
    to be read.
    If you don't want Ctrl-Z's to be treated as an end-of-file marker, then
    arguably you're not dealing with text files, so of course binary mode is
    Keith Thompson, May 4, 2014
  15. How is the implementation incorrect?
    Keith Thompson, May 4, 2014
  16. It could be a C program running on Windows, compiled with *any*
    C compiler but using Microsoft's C runtime library.

    In the Unix-like world (which I'm more familiar with), different
    compilers typically use the C runtime library provided by the OS.
    In the Windows world, as I understand it, the C runtime library
    isn't as closely tied to the OS; some compilers might generate code
    that uses the Microsoft CRT, others might provide their own.

    Apparently not. On Unix-like systems, the word "shell" has a
    very specific meaning, and it's not what you were referring to.
    It's probably similar on Windows. As you wrote elsethread,
    "run-time thingy" would have been more accurate.
    The sample program wasn't necessarily a direct response to what you
    wrote -- and not everything in a followup has to be a disagreement.

    The point of the program was to demonstrate the behavior (that Ctrl-Z
    in a file triggers an end-of-file condition) as clearly as possible,
    since there has been some confusion between the response to Ctrl-Z
    (more precisely '\x1a`) character in a file and the behavior of
    Ctrl-Z in keyboard input.
    Keith Thompson, May 4, 2014
  17. jacob navia

    jacob navia Guest

    Le 04/05/2014 03:09, Keith Thompson a écrit :
    Hi kiki!

    lcc64 ctrlz.c // Compile it with a good compiler
    lcclnk64 ctrlz.obj // Link it with a good linker
    ctrlz.exe // Execute
    saw_A = 1
    saw_Z = 1
    saw_Ctrl_Z = 1

    You are using the wrong compiler kiki

    Yours sincerely

    jacob :)

    P.S. I did not carry on the CP/M tradition. It was a sad decision but I
    fear people that use lcc-win do not want CP/M backwards compatibility.

    Of course all my efforts to build a reasonable C compiler will be
    ignored. Off topic, shrewd businessman trying to sell his wares, etc.

    Go ahead kiki
    jacob navia, May 4, 2014
  18. jacob navia

    Kaz Kylheku Guest

    The TTY eof character, usually 4, is simply a command in
    "canonical input mode" which means: "stop waiting for
    characters and return immediately". The command itself
    is consumed.

    The eof effect comes as a consequence of this command
    being issued at the start of an input line. No characters
    have been accumulated, and so the read returns zero.
    A zero length read resembles the end of a file.
    Kaz Kylheku, May 4, 2014
  19. jacob navia

    Lew Pitcher Guest

    In ASCII (and derivatives), 0x1a (aka ^Z) has been given the mnemonic "SUB",
    with the explanation: "SUB is used in the place of a character that has
    been found to be invalid or in error. SUB is intended to be introduced by
    automatic means."

    "Unknown character" would fit the intent of the SUB (0x1A ^Z) character.
    Lew Pitcher, May 4, 2014
  20. jacob navia

    BartC Guest

    Apart from MSVC which apparently gives 1,0,0, that is also the result with
    gcc, PellesC, DMC, Clang and g++, all running on Windows.

    (gcc under Linux gave 1,1,1.)

    So lcc-win is the odd-one-out, in text mode.

    (In binary mode, which I generally use, that gives 1,1,1 always.)
    BartC, May 4, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.