Don Knuth and the C language

Discussion in 'C Programming' started by jacob navia, Apr 30, 2014.

  1. jacob navia

    James Kuyper Guest

    As a general rule in this newsgroup, when using a term for which the C
    standard provides it's own definition, it's best to avoid the confusion
    that can be caused by using that term with a conflicting meaning. Jokes
    to the contrary notwithstanding, the C standard only provides a few
    different definitions for 'static', and that isn't one of them.

    A C term that comes close to that meaning is defined in 6.2.5p23: "A
    type has _known constant size_ if the type is not incomplete and is not
    a variable length array type." I've used underscores to indicate that
    "known constant size" is italicized, and ISO convention indicating that
    this sentence defines the meaning of that phrase.
    For instance, 'buf' is 'static' according to your definition, but is
    isn't 'static' according to any of the C standard's several definitions.
    However, *buf does have a known constant size.
     
    James Kuyper, May 1, 2014
    #21
    1. Advertisements

  2. jacob navia

    Stefan Ram Guest

    »All objects with static storage duration shall be initialized
    (set to their initial values) before program startup.« (N1570)

    This hints at the idea: »static« ~ »before program startup«.
    Also, think about why »static_assert« is called »static_assert«.

    N1570 always carefully distinguishes between the English word
    »static« and the C keyword »static«.

    An object whose identifier is declared without the
    storage-class specifier _Thread_local and without »static«
    and either with external or internal linkage has static
    storage duration, but it is not declared with the keyword
    »static«.
     
    Stefan Ram, May 1, 2014
    #22
    1. Advertisements

  3. jacob navia

    James Kuyper Guest

    But it also uses the English word (NOT the keyword) with a specialized
    meaning in C, in the phrase "static storage duration". One could
    reasonably assume, as I did, that you were using the term "static
    buffer" to mean a buffer that had "static storage duration". The phrase
    "known constant size", being precisely defined by the standard, and
    closely connected to the point you were making, would have been a much
    better term to use (though it's not a drop-in replacement for "static" -
    the sentence would require significant rearrangement).
     
    James Kuyper, May 1, 2014
    #23
  4. The standard usually uses the English word "static" to refer to the
    storage duration. The keyword is (ab)used for a couple of other
    meanings.

    Your statement upthread was:

    In C one actually only gets the start address, but has to
    learn the size of the buffer by other means. (The size of
    the pointee which is provided by the type system of C can
    only be employed for static buffers.)

    I can't think of any meaning of "static", consistent with the C
    standard's usage of the word or not, for which that statement is true.
    Consider:

    #include <stdio.h>
    #include <stdlib.h>
    int main(void) {
    int (*p0)[10] = malloc(sizeof *p0);
    int random_size = rand() % 10 + 10;
    int (*p1)[random_size] = malloc(sizeof *p1);
    printf("p0 points to a %zu byte object\n", sizeof *p0);
    printf("p1 points to a %zu byte object\n", sizeof *p1);
    }

    Nothing here has static storage duration. Are both *p0 and *p1
    "static buffers"? If so, what exactly do you mean by that?
     
    Keith Thompson, May 1, 2014
    #24
  5. (snip)
    (snip, I wrote)
    (snip, I also wrote)
    It started as I was trying to understand how C pointers compare
    to PL/I pointers. PL/I doesn't really have anything like
    (unsigned char *), though you can have arrays or strings of CHAR.
    (While PL/I wasn't all that popular, it was for some time one
    of the more popular languages with pointers.)

    The PL/I way is with variables, or strings of type BIT, which have
    many of the same properties as CHAR, such as the ability to use
    SUBSTR and string concatenation for substrings.

    If you have a FLOAT BIN(21) and a FIXED BIN(31,0) (That is, 32 bit
    floating and fixed point values, with a little luck) you can

    DCL I FIXED BIN(31,0), X FLOAT BIN(53);
    X=3.14;
    UNSPEC(I)=UNSPEC(X);

    Where UNSPEC on the right converts to a bit string, and on the
    left converts a bit string back to a non-BIT type.
    (And no problem with alignment that could happen in other
    ways of doing the assignment.)

    On the other hand, one tends to write more efficient bit-moving
    code in C, as long as one can work with more than one bit at
    a time. For PL/I, you hope that the compiler figures out where
    the byte (or word) boundaries are and does efficient moves, but
    you can't usually be sure.)

    You can shift and AND to extract and insert bits into an 8 bit
    char, but the operations are enough different for a 9 bit char
    that, pretty much, no-one will write code to do it.

    -- glen
     
    glen herrmannsfeldt, May 1, 2014
    #25
  6. jacob navia

    Kaz Kylheku Guest

    You can use unsigned char * instead of void *.

    The benefits are:
    - cast required in both directions, so more safety.
    - ready for byte access and arithmetic: cumbersome
    conversions that do not add safety are eliminated.

    I have experimented with using unsigned char * as a generic pointer
    to any object: for allocator returns, polymorphism such as
    the context for callbacks and so on. It is perfectly fine.
     
    Kaz Kylheku, May 1, 2014
    #26
  7. The function returns a void *. So it's a pretty fair guess that it
    mallocs a buffer, returns it, and writes the length to clen. It could
    pathologically return a pointer to a static buffer, but few real
    programmers would be that stupid.
    People aren't perfect. "How does this function behave in CHAR_BIT is not 8"
    is something that is quite likely not t be documented. A good language is
    one which is robust to a bit of sloppiness, poor design, people not
    documenting things or even misdocumenting things.
    For the compressor, it would almost certainly pad the input bitstream
    to a whole number of bytes. So you get a few trailing clear bits at the
    end if you try to compress a bitstream that's not a multiple of bytes.
    So the caller has to set up his bitstream so that it can tolerate
    trailing clear bytes.
    For the compressed stream, it is a bitstream set up so that it tolerates
    trailing clear bits. Typically there's a sentinel sequence to indicate
    end of data.
    You probably want to pass it a slightly longer sequence to be absolutely
    sure. if you can compress and recover 0x100 then it's unlikely that
    the system has a assumption that CHAR_BIT is 8, however.
     
    Malcolm McLean, May 2, 2014
    #27
  8. Having to guess is unacceptable. If a function allocates a buffer by
    calling malloc() and doesn't document the fact that the caller will have
    to free() it, I won't be using that function, thankyouverymuch.
    What does the language have to do with whether a function is documented?
    If it operates on bitstreams, but it doesn't distinguish between a
    bitstream consisting of 9 bits and one consisting of 16 bits, with the
    last 7 equal to 0, then it's not a valid compression function. Unless
    it's meant to be lossy -- something that would need to be mentioned in
    the documentation if there were any.
    And if that sentinel sequence occurs as valid data in the middle of the
    bitstream? Or is it not intended to operate on arbitrary data?
    I can't be sure what the function does in the normal case, where
    CHAR_BIT==8. If I had to compress data and decompress data on a 9-bit
    system I'd find something else to use. If I *had* to use this one for
    some reason, I'd want to examine the source code and/or perform very
    thorough testing; seeing it behave sensibly with a byte value of 0x101
    wouldn't be enough to give me confidence that it won't corrupt my data.

    In real life, such functions *do* have documentation -- perhaps good,
    perhaps bad, perhaps incomplete, but more than just a bare declaration.

    For this hypothetical example, and for the sake of discussion, I'd be
    willing to accept that documentation does exist, and that it describes
    the behavior adequately and correctly. Lacking that, I see little
    reason to consider using it.
     
    Keith Thompson, May 2, 2014
    #28
  9. It is and it isn't. It's not "lossy", that has another meaning. The last
    few bits are often a problem for a bitstream, because conventional
    backing store interfaces don't normally allow for storage of a specified
    number of bits. So the true end of data is going to have to be tagged
    specially, somehow.
    A bitstream is data, not random bits. So a sentinel is like a zero in
    a string. If you need to represent a string with embedded zeroes, you can
    have an escape. But it has to be parsed by something which understands
    it. As a bitstream gets passed about on systems with varying byte sizes,
    it will tend to accumulate trailing bits, inevitably. Until it is
    parsed and trimmed back to its genuine size. Unlikely to be much of
    a practical problem, and we're only talking about one or two bytes
    each time.
    Any function can have bugs. The test tells you that CHAR_BIT isn't
    hard-coded to 8, it treats larger bytes as larger. There might be
    more bugs lurking there, for example if it uses a "rack" of 32 bits, and
    bytes are also 32 bits long, the "rack" might be too short. But that's
    true of almost any function written in any language.
    If you can employ perfect programmers who never make any mistakes, then
    it really doesn't matter much what language you use. They never make
    mistakes, so everything will always go very smoothly.
    The question is how the language responds to a programmer being sloppy,
    or miscommunication (meticulous documentation, but in Chinese), or
    designs not being done, or being compromised by urgent changes to
    requirements.
    We see that being given a difficult situation - an undocumented compress
    function and a system which doesn't use 8 bit bytes, C doesn't respond
    too badly. We can work out how the function works relatively easily,
    we can isolate any bugs / limitations.

    No-ones saying that these are ideal circumstances, or that code shouldn't
    be documented.
     
    Malcolm McLean, May 2, 2014
    #29
  10. ....
    You guys really need to get a room!
    Kiki doesn't know how to operate in other than ideal circumstances.

    That's why he prefers this newsgroup to anything resembling the real world.

    --
    CLC in a nutshell.
     
    Kenny McCormack, May 2, 2014
    #30
  11. Chapter & verse, please?

    c89 or c99 or later?

    Please do be specific.

    --
    About that whole "sent His Son to die for us thing", I've never been able
    to understand that one. It's not like Jesus isn't going back to Heaven
    after his Earthly self dies, right? So, having him be executed, and
    resurrect a few days later strikes me as being more akin to spending the
    weekend at the non-custodial parent's house than "dying", doesn't it?
     
    Kenny McCormack, May 2, 2014
    #31
  12. Thanks, that's worth knowing.
     
    Malcolm McLean, May 2, 2014
    #32
  13. Do you mean an even number or a whole number? If it's really an even
    number, why would an odd number of bytes be forbidden?
    Is this actually a standard? Can you cite a reference?

    I'll be pleasantly surprised if there's really just one consistently
    used standard.
     
    Keith Thompson, May 2, 2014
    #33
  14. I'm sure he meant a whole number.
    The only official standard like this that I'm aware of is for hashes and
    block encryption, to pad a variable-sized input up to a multiple of the
    block size, but IIRC it uses one 1 followed by one or more 0s, not one 0
    followed by one or more 1s as given above.

    Note that if your input size is an exact multiple of the block size, you
    end up with an entire block of padding; this is necessary to distinguish
    between a padded input and an unpadded input that happens to end with
    the padding sequence.

    This scheme has become a common convention for similar needs in other
    domains as well, verging on a de facto standard.

    S
     
    Stephen Sprunk, May 2, 2014
    #34
  15. There is also the CP/M tradition. CP/M file system only stores the
    number of blocks, not the number of bytes. For text files, CP/M marked
    the end of the actual text with X'1A' (control-Z).

    For some reason that I never knew, this tradition continued with
    MS-DOS files, even though the file system does count the bytes.

    Even today, it is not unusual to see files with X'1A' at the end,
    and for programs reading text files to consider it the end.

    -- glen
     
    glen herrmannsfeldt, May 2, 2014
    #35
  16. jacob navia

    Stefan Ram Guest

    In ASCII (1968) we actually have:

    0011010 1A 26 ^Z SUB Substitute
    0011011 1B 27 ^[ ESC Escape
    0011100 1C 28 ^\ FS File Separator
     
    Stefan Ram, May 2, 2014
    #36
  17. ....unless the sequence happens to end with something that can't be
    padding. E.g. using 0+1s as the padding, a sequence that ends xxx0 can
    end on a block/byte boundary without needing any "fake" padding.

    (This is just a clarification. You don't say that every input must be
    padded.)
     
    Ben Bacarisse, May 2, 2014
    #37
  18. Ah, I thought the proposal was for 0 and *one* or more 1s. No idea why
    I thought that, just an incorrect assumption.
     
    Ben Bacarisse, May 2, 2014
    #38
  19. (snip on EOF indication, then I wrote)
    Yes. Well, for CP/M it would only need to be in the last block, but
    I presume that it was actually tested anywhere in the file.
    Nothing against back compatibility, but it is over 30 years now,
    and I am pretty sure that by now no-one is developing on CP/M
    to port to DOS/Windows.
    The one I ran into for a long time was the MS-DOS, PRINT spools,
    at least to 3.x, and probably longer.

    About 10 years after MS-DOS, I was writing programs to do bit-mapped
    graphics on different printers. Printing stops at X'1A'. For parallel
    printers, you could copy to LPT1, but for serial printers, it didn't
    do any flow control at all, so about the only way was to use the
    print spooler, which did.
    -- glen
     
    glen herrmannsfeldt, May 2, 2014
    #39
  20. [...]

    Even today, a Windows C program reading input in text mode treats
    Control-Z (character 26) as an end-of-file indicator. (I just
    tried it on Windows 7 with MSVC 2010 Express.)
     
    Keith Thompson, May 3, 2014
    #40
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.