Don Knuth and the C language

Discussion in 'C Programming' started by jacob navia, Apr 30, 2014.

  1. jacob navia

    jacob navia Guest

    Le 05/05/2014 00:14, BartC a écrit :
    uses msvc run time

    PellesC, DMC, Clang and g++, all running on Windows.

    Yes, all are bug compatible with msvc.
    Of course. Reporting a correct result is not bug compatible!

    Yes, lcc-win is an odd compiler. It doesn't treat 26 as EOF but as 26.

    Odd isn't it?
    jacob navia, May 4, 2014
    1. Advertisements

  2. gcc on Windows is often installed as part of a POSIX-like layer
    that imposes its own semantics. There are serveral ports of gcc
    to Windows, and I doubt that they all behave the same way. And of
    course it's the runtime library, not the compiler, that's relevant;
    some implementations (like MinGW, I think) combine gcc with the
    MS runtime library, others combine gcc with some other library
    (like Cygwin).

    And g++ is not a C compiler, so ...

    There are two major differences between POSIX-style and Windows-style
    text files: the end-of-line representation and the treatment
    of Ctrl-Z as an end-of-file marker. It would be interesting to
    see, for each of the compilers you mention, how it treats both.
    For Ctrl-Z handling, you can use the program I posted earlier.
    For end-of-line representation, you can write '\n' to a text file
    in text mode, then read it back in binary mode.

    I'm not surprised that some implementations might use Windows-style
    handling for end-of-line and POSIX-style handling of Ctrl-Z, nor
    do I suggest that that approach is better or worse than any other.
    And either behavior is perfectly valid as far as the C standard is

    N1570 7.21.2p2:

    A text stream is an ordered sequence of characters
    composed into lines, each line consisting of zero or more
    characters plus a terminating new-line character. Whether
    the last line requires a terminating new-line character is
    implementation-defined. Characters may have to be added,
    altered, or deleted on input and output to conform to
    differing conventions for representing text in the host
    environment. Thus, there need not be a one- to-one correspondence
    between the characters in a stream and those in the external
    representation. Data read in from a text stream will necessarily
    compare equal to the data that were earlier written out to that
    stream only if: the data consist only of printing characters
    and the control characters horizontal tab and new-line; no
    new-line character is immediately preceded by space characters;
    and the last character is a new-line character. Whether space
    characters that are written out immediately before a new-line
    character appear when read in is implementation-defined.

    My advice: Use text mode for text, binary mode for non-text. If you
    care what happens when you write '\x1a' to a file and read it back
    (assuming, as in most commonly used character sets, that '\x1a' is a
    control character), then you're not working with text.

    Admittedly it's not always that simple, especially if you have a
    requirement to deal with "foreign" text files.
    Keith Thompson, May 4, 2014
    1. Advertisements

  3. I haven't thought about this for some years, but didn't DEC systems
    start the use of ^Z for EOF on terminal input streams?

    -- glen
    glen herrmannsfeldt, May 4, 2014
  4. In the olden days, it made sense to talk about "an EBCDIC machine".
    The screen would be memory-mapped to bytes representing EBCDIC
    characters. most programs would operate via system-level utilities
    which were hardcoded to accept EBCDIC text strings. All the text
    files on the system would be EBCDIC.
    Nowadays, there's still that concept. The filing system will have
    a fixed representation for file names, for example. But it's less
    meaningful than it was. Most program use raster displays, and it's
    relatively easy to set up a font to display any character set.
    There will be a mixture of text files on the system, downloaded
    from the internet, and users expect that most software read all of
    the common formats.

    Jacob's approach of reading all files in binary is probably a good
    intermediate step. Long-term, of course, we want all text files
    to return utf-8 strings when read, transparently.
    Malcolm McLean, May 5, 2014
  5. In case there is a linguistic confusion here, saying that something is
    the "odd one out" is not in the least critical and does not mean that
    the thing is odd. Calling a thing "odd" is mildly critical, but being
    the "odd one out" simply means that it is the exception -- exceptional
    if you like.
    Ben Bacarisse, May 5, 2014
  6. Yes, I am pretty sure that TOPS-20, for one, did this with Ctrl-Z. It
    was, however, a command to the TTY driver and had no meaning in a data
    stream. I would put money on that twist being a CP/M invention (or at
    least an invention from a vary small system that needed some way to
    signal end-of-data but did not want the full tty driver mechanism).

    Ben Bacarisse, May 5, 2014
  7. jacob navia

    Lew Pitcher Guest

    CP/M 2.2 delivered 128-byte blocks to the function 20 Read Sequential and
    function 33 Read Random BDOS calls
    The A register was set to a non-zero value "if no data exists at the next
    record position (e.g. end of file occurs)."
    Lew Pitcher, May 5, 2014
  8. (snip)
    The IBM printing terminals used with S/360 and later didn't
    use EBCDIC, but there was a conversion along the way.

    I believe line printers like the 1403 have a map somewhere
    indicating which chararacter (bit pattern) is where on the
    print train.

    The 3800 (a laser printer that doesn't look anything like
    one you would put on a desk) does all the character coding
    in software.

    The 2250 and 2260 use hardware character generators, but I
    don't know which code they use.

    And ASCII terminals were commonly used with translation
    somewhere along the way.

    There are some characters in EBCDIC and not ASCII (not and cent,
    for two) and some in ASCII not in EBCDIC (carat and tilde).

    The PL/I (F) compiler (the only one I have noted such in) has
    comments in some modules noting that they are character code
    independent, and others noting that:

    -- glen
    glen herrmannsfeldt, May 5, 2014
  9. jacob navia

    jacob navia Guest

    Le 05/05/2014 01:20, Ben Bacarisse a écrit :
    Yes, this ensures that I remain in his killfile!
    Yes. I rewrote ALL stdio. I am Microsoft clean now. Before, in the 32
    bit version I used MSVCRT.DLL for stdio.
    d:\lcc-src\libc\test>type tstdin.c
    #include <stdio.h>
    int main(void)
    int c;
    while ((c=getchar()) != EOF) {

    d:\lcc-src\libc\test>lc64 tstdin.c



    In the first line of input I write "abc" then Ctrl-z. Nothing happens,
    the Ctrl-Z is ignored.
    In the second line of input I type Ctrl-Z at the START of the line. The
    ReadFile() function of the OS returns an end of file condition.

    Just as in Unix.
    jacob navia, May 5, 2014
  10. [/QUOTE]
    Yes, that's the first case above, where there are characters ("abc")
    waiting. They get passed to the program,
    And now there are no characters waiting, so read() returns 0 which
    indicates end of file (it doesn't "close" anything).

    -- Richard
    Richard Tobin, May 5, 2014
  11. [/QUOTE]
    If you type the sequence "a", linefeed, end-of-file-character,
    successive calls to getchar() should return 'a', '\n', EOF, EOF, EOF,
    .... because the the EOF condition should be set and remain set until
    clearerr() is called.

    However, on Linux (I think I should really say "using glibc") only
    one EOF is returned and the system waits for another character to be

    There is a discussion about fixing it at

    -- Richard
    Richard Tobin, May 5, 2014
  12. jacob navia

    Noob Guest

    (Waaay off-topic)

    Some mail/news clients support so-called "enhanced plain-text features"
    to display *bold* /italic/ or _underlined_ text.

    Noob, May 5, 2014
  13. jacob navia

    James Kuyper Guest

    I know - I've got that feature turned off, because I've seen too many
    messages become unreadable when viewed using a incompatible client when
    it's turned on. Messages using none of the enhanced features are
    readable everywhere, though, as shown above, there is some corresponding
    James Kuyper, May 5, 2014
  14. jacob navia

    Ken Brody Guest

    Even more fun (FSVO) are clients that turn plain text into emoticons, making
    things such as:


    rather "interesting" to read.
    Ken Brody, May 5, 2014
  15. Using the compiler that came with VS 2010 ("Microsoft (R) 32-bit C/C++
    Optimizing Compiler Version 16.00.40219.01 for 80x86"), I got a 1, 0, 0
    but I got a 1, 1, 1 using gcc 4.8.2 on i686-pc-cygwin.

    - Anand
    Anand Hariharan, May 7, 2014
  16. Also in C++. In both cases only in non-static methods of course.
    COBOL also allows it, and I'm nearly certain that's where PL/I got it.
    (To a first approximation PL/I is COBOL + FORTRAN + spices, beat well
    and bake until crisp.) Similarly the PL/I syntax for declaring
    structures is much closer to COBOL than the algol ... C tribe.

    <snip rest>
    David Thompson, May 25, 2014
  17. (snip, I wrote)
    and also ALGOL
    Yes. As I understand it, COBOL allows only 1D arrays, but you can
    build array structures of arrays of ... to an appropriate depth.

    So, partial qualification is a convenient way not to have to write
    all the qualifiers that you put in only to allow enough dimensions.
    PL/I does allow more than one dimension, though.

    I believe PL/I also inherited the ability to move subscripts around
    between structure qualifiers, or at least move them right.
    (I am not sure if you can move them left or not, I never tried.)

    -- glen
    glen herrmannsfeldt, May 26, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.